InfoGuides: Text & Data Mining Sources: Get Started

Text & Data Mining Sources

Use this guide to identify resources available for text and data mining. The guide is broken out into these main sections:

Access Collections
Lists library subscription and popular free resources which allow or are suitable for text and data mining.
Historic Newspaper & Text Data
Lists raw XML text data purchased by the University Libraries to be used for text and data mining.

The following sections explain each source's limitations and provides recommendations for available sources.

Current News Sources
Social Media Data

It is important to note that not all online resources allow text mining and that there are legal and ethical limitations to consider. Additionally, if you are going to use Artificial Intelligence (AI) tools with library-licensed materials, you need to contact us first because some vendors explicitly forbid doing so. Email datahelp@gmu.edu and we will help you.

This guide is a companion to the Text Analysis Tools InfoGuide, which describes how to text data mine using free, web-based tools.

Related Guides

Please see the following related infoguides for further help with your digital project.

Qualitative Research and Tools
text analysis can be performed using qualitative research software.
Citing Data
provides detailed information and examples on how you should cite data in your research.
Data Visualization
guidelines and best practices for visualizing data with a wide range of tools.
Find Data & Statistics
links to resources that include text corpora among their list of data sources and resources suitable for natural language processing and machine learning.
Research Data Management Basics
for understanding best practices for managing your data and keeping organized.
Software for Data & Digital Scholarship
see R, NVivo, and QDAMiner.