Corpus: A collection of written texts. In ProQuest TDM Studio, users create a corpus from licensed ProQuest content. See also dataset.
Dataset: A collection of data. In ProQuest TDM Studio, your dataset is the corpus you create. See also corpus.
Jupyter Notebook: An environment that combines human-readable text with computer-readable code. See Jupyter Notebook’s website for more information. You can also go through Quinn Dombrowski, Tassie Gniady, and David Kloster’s “Introduction to Jupyter Notebooks” lesson on Programming Historian.
ProQuest TDM Studio: ProQuest’s text and data mining platform that enables researchers to create datasets using licensed ProQuest content and analyze those datasets by running Python or R scripts in an accompanying Jupyter Notebook. See their website for more information.
Python: An interpreted, high-level, general-purpose programming language that utilizes an object-oriented approach. The goal of using Python is for programmers to be able to write clear, logical code for large and small-scale projects. See Python’s website for more information.
Text and data mining (TDM): The process of deriving high-quality information from text. High quality information is typically derived through the devising of patterns and trends through pattern learning.
R: A programming language and free software environment for statistical computing and graphics. R is widely used among statisticians and data miners for developing statistical software and data analysis. See the R Project for Statistical Computing’s website for more information.
Script(s): Used to describe small programs (up to a few thousand lines of code) in languages like R and Python.
Workbench: Where researchers access their datasets and Jupyter Notebook within ProQuest TDM Studio. Access the workbench by logging in here.