Skip to Main Content
George Mason University | University Libraries
See Updates and FAQs for the latest library services updates. Subject Librarians are available for online appointments, and Virtual Reference has extended hours.

Text & Data Mining Sources

Access text and data mining sources and text analysis tools.

Historic Newspaper & Text Data

The texts listed below were purchased as bulk, raw data and are in XML format. The data are available offline; to access it, email We will set up an appointment with you to learn more about your project and discuss how to use this data.  

The ProQuest Historic Newspapers data is not organized in a meaningful way so it will take some extra time for you to find the data you need. DiSC staff has created their own documentation to help you. We know what publications are included as well as their time span, and we have approximations of data for each title (e.g., 8.3 GB of zipped data for the Atlanta Constitution).

There are three ways of finding data from ProQuest Historic Newspapers:

  • Using an existing Python script
  • Searching through zipped files using terminal or the command line
  • Searching through unzipped files via a text editor 

The Gale data is organized, so it is easier to find the data you need from their resources. We have a spreadsheet that includes pertinent information about each title included in 19th Century U.S. Newspapers. 


  • 19th Century U.S. Newspapers (Gale)
  • ProQuest Historic Newspapers (organized chronologically by year): 
    • Baltimore Sun, 1837- 1931
    • Chicago Tribune, 1849-1932
    • New York Times, 1851-1937
    • Atlanta Constitution, 1868-1933
    • Boston Globe, 1872-1984
    • Washington Post, 1877-1934 
    • Los Angeles Times, 1881-1932
    • Wall Street Journal, 1889-1935
    • Christian Science Monitor, 1908-1993
    • Chicago Defender, 1909- 1975
    • New York Amsterdam News, 1922-1993  


  • Making of Modern Law (Gale)
    • Legal Treatises, 1800-1926, 
    • Primary Sources, and
    • Trials, 1600-1926.
  • Sabin Americana History and Culture, 1500-1926 (Gale)