Skip to Main Content
George Mason University | University Libraries
See Updates and FAQs for the latest library services updates. Subject Librarians are available for online appointments, and Virtual Reference has extended hours.

Digital Humanities

A guide to the concepts and tools of the expanding field of digital humanities
Text analysis identifies trends across a large number of text-based documents.

Text Analysis Tools

Bookworm. Enables viewers to explore the lexical trends of a corpora of digitized books found in HathiTrust. Users are able to search using ngrams and compare usage over time. Users can filter their search by date, metric, and case sensitivity. 

Constellate. A text and data analytics service from JSTOR and Portico, Constellate is a platform for learning and performing text analysis, building datasets, and sharing analytics course materials. The platform provides value to users in three core areas: they can teach and learn text analytics, build datasets from across multiple content sources, and visualize and analyze their datasets.

Google Ngram Viewer. Displays user-selected words or phrases (ngrams) in a graph that shows how those phrases have occurred in a corpus. Google Ngram Viewer's corpus is made up of the scanned books available in Google Books. Typically, the X axis shows the year in which works from the corpus were published, and the Y axis shows the frequency with which the ngrams appear throughout the corpus. Users input the ngrams and then can select case sensitivity, a date range, language of the corpus, and smoothing. 

HathiTrust Research Center AnalyticsSupports large-scale computational analysis of the works in the HathiTrust Digital Library to facilitate non-profit and educational research. Services offered include extracted features, text analysis algorithms, and data capsules. 

ProQuest TDM Studio. Consists of two components: the workbench and visualizations. The workbench enables researchers to create datasets using licensed ProQuest content and analyze those datasets by running Python or R scripts in an accompanying Jupyter Notebook. Those new to text and data mining can use the visualization component of the platform, which is a user-friendly way to engage with ProQuest content and does not require any knowledge of programming languages.

Voyant. A web-based text reading and analysis environment that enables users to read and explore a corpus of text using a multi-panel interface. Voyant's default tools include a word cloud; a reader to read the documents of the corpus; a line graph that shows the distribution of a word's occurrence across the corpus; a textual overview of the corpus; and keywords in context. Several other tools are also available for users.

WordItOut. A simple tool for creating word clouds, which visualize the top frequency words of a corpus. Users can generate word clouds from sentences, whole documents, or tables, and filter the text to configure which words to display or remove.