Text analysis identifies trends across a large number of text-based documents.
Text analysis tools
- Bookworm. Enables viewers to explore the lexical trends of a corpora of digitized books found in HathiTrust. Users are able to search using ngrams and compare usage over time. Users can filter their search by date, metric, and case sensitivity.
- Google Ngram Viewer. Displays user-selected words or phrases (ngrams) in a graph that shows how those phrases have occurred in a corpus. Google Ngram Viewer's corpus is made up of the scanned books available in Google Books. Users input the ngrams and then can select case sensitivity, a date range, language of the corpus, and smoothing. Read the documentation.
- HathiTrust Research Center Analytics. Supports large-scale computational analysis of the works in the HathiTrust Digital Library to facilitate non-profit and educational research. Services offered include extracted features, text analysis algorithms, and data capsules. Read the documentation.
- ProQuest TDM Studio. Consists of two components: the workbench and visualizations. The workbench enables researchers to create datasets using licensed ProQuest content and analyze those datasets by running Python or R scripts in an accompanying Jupyter Notebook. Those new to text and data mining can use the visualization component of the platform, which is a user-friendly way to engage with ProQuest content and does not require any knowledge of programming languages.
- Voyant. A web-based text reading and analysis environment that enables users to read and explore a corpus of text using a multi-panel interface. Voyant's default tools include a word cloud; a reader to read the documents of the corpus; a line graph that shows the distribution of a word's occurrence across the corpus; a textual overview of the corpus; and keywords in context. Several other tools are also available for users. Read the documentation and tutorials, watch screencasts, and read "How is Voyant different from using an AI service like Copilot or ChatGPT?"
- WordItOut. A simple tool for creating word clouds, which visualize the top frequency words of a corpus. Users can generate word clouds from sentences, whole documents, or tables, and filter the text to configure which words to display or remove. Read the documentation.
Text analysis projects