Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
| University Libraries
See Updates and FAQs for the latest library services updates. Subject Librarians are available for online appointments, and Virtual Reference has extended hours.

Software for Digital Scholarship

Information about DiSC-supported software for the collection, processing, analysis or display of numeric, text, or geospatial data

Easy-to-Use Tools

All of this content has been moved to the Text Analysis Tools guide. Keep hidden "just in case" and eventually will delet. 4/8/2020

Comprehensive Tools

Voyant - Allows for the creation of word maps, n-grams, as well as word frequencies either for individual documents or for an entire corpus. Users can export visualizations for HTML or as PNG or SVG. Users are limited to 100 separate documents in their corpus for the online version, download the local server version for more flexibility. 

Lexos - Allows users to upload their corpus, clean the documents, and then perform visualizations or analyze them. Files are limited by size and type.

Word Clouds & N-grams

Wordle - Java-based word cloud generator that allows users to paste their corpus or link to an online text.

Concordle - Similar to Wordle (though not as 'pretty' and not Java), it creates word clouds based on the user's corpus.

Google Books Ngram Viewer - Uses Google Books to create charts for searched words and phrases. Shows language usage over time.

Find More Digital Research Tools

Digital Research Tools (DiRT) - A registry of digital research tools for scholarly use. It has a number of text mining tools available. DiRT makes it easy for those conducting digital research to find and compare digital research tools.

Text Analysis Portal for Research (TAPoR) - Discover research tools for studying texts including detailed search and curated lists

Free Tools

AntCont - A free text corpus analysis toolkit for concordancing and text analysis.

MALLET - Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

OpenRefine - Open source software that allows users to clean their datasets prior to use. OpenRefine can only use information entered into spreadsheets, typically in a .csv format but can utilize a wide range of formats such as JSON.

Python (via The Programming Historian) -  A programming language that is used by many for text mining and analysis. The Programming Historian has a series of lessons on using Python for manipulating and analyzing text data. Packages include nltk. 

R and RStudio - Open source statistical analysis software that rely on community driven packages to mine data. R is script heavy, meaning a programming background is highly recommended, but it offers the most flexibility with mining as well as creating visualizations of the results. See the book Text Mining with R and The Programming Historian's "Basic Text Processing in R."

Licensed Software

NVivo - A qualitative research program that is powerful and provides organization for the entire project. The learning curve is small to do basic tasks, but there are many advanced features that can take longer to get accustomed to. NVivo is available for use in the Digital Scholarship Center or in the Arlington Campus Library.

QDA Miner/WordStat - More powerful than NVivo and easier to use than SAS, WordStat provides people without programing experience the ability to do advanced contnt analysis and text-mining. QDA Miner Lite is a free version, with fewer features, that is worth checking out.

SAS - Utilized by many large companies and government entities, SAS is used with very large datasets because it does not require a powerful computer to perform tasks. The software is available in all Mason computer labs as well as the Virtual Computing Lab.