The following web-based tools, software programs, and programming languages are used for text analysis.
- AntCont. A free text corpus analysis toolkit for concordancing and text analysis.
- Concordle. Creates word clouds based on the user's corpus.
- Lexos. Allows users to upload their corpus, clean the documents, and then perform visualizations or analyze them. Files are limited by size and type.
- MALLET. Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
- Python. A programming language that is used by many for text mining and analysis. The Programming Historian has a series of lessons on using Python for manipulating and analyzing text data.
- R and RStudio. Open source statistical analysis software that rely on community driven packages to mine data. R is script heavy, meaning a programming background is highly recommended, but it offers the most flexibility with mining as well as creating visualizations of the results. See the book Text Mining with R and The Programming Historian's "Basic Text Processing in R."
- Text Analysis Portal for Research (TAPoR). Discover research tools for studying texts including detailed search and curated lists.