All of this content has been moved to the Text Analysis Tools guide. Keep hidden "just in case" and eventually will delet. 4/8/2020
Voyant - Allows for the creation of word maps, n-grams, as well as word frequencies either for individual documents or for an entire corpus. Users can export visualizations for HTML or as PNG or SVG. Users are limited to 100 separate documents in their corpus for the online version, download the local server version for more flexibility.
Lexos - Allows users to upload their corpus, clean the documents, and then perform visualizations or analyze them. Files are limited by size and type.
Word Clouds & N-grams
Wordle - Java-based word cloud generator that allows users to paste their corpus or link to an online text.
Concordle - Similar to Wordle (though not as 'pretty' and not Java), it creates word clouds based on the user's corpus.
Find More Digital Research Tools
Digital Research Tools (DiRT) - A registry of digital research tools for scholarly use. It has a number of text mining tools available. DiRT makes it easy for those conducting digital research to find and compare digital research tools.
AntCont - A free text corpus analysis toolkit for concordancing and text analysis.
MALLET - Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
OpenRefine - Open source software that allows users to clean their datasets prior to use. OpenRefine can only use information entered into spreadsheets, typically in a .csv format but can utilize a wide range of formats such as JSON.
Python (via The Programming Historian) - A programming language that is used by many for text mining and analysis. The Programming Historian has a series of lessons on using Python for manipulating and analyzing text data. Packages include nltk.
R and RStudio - Open source statistical analysis software that rely on community driven packages to mine data. R is script heavy, meaning a programming background is highly recommended, but it offers the most flexibility with mining as well as creating visualizations of the results. See the book Text Mining with R and The Programming Historian's "Basic Text Processing in R."
NVivo - A qualitative research program that is powerful and provides organization for the entire project. The learning curve is small to do basic tasks, but there are many advanced features that can take longer to get accustomed to. NVivo is available for use in the Digital Scholarship Center or in the Arlington Campus Library.
QDA Miner/WordStat - More powerful than NVivo and easier to use than SAS, WordStat provides people without programing experience the ability to do advanced contnt analysis and text-mining. QDA Miner Lite is a free version, with fewer features, that is worth checking out.
SAS - Utilized by many large companies and government entities, SAS is used with very large datasets because it does not require a powerful computer to perform tasks. The software is available in all Mason computer labs as well as the Virtual Computing Lab.