Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
| University Libraries
See Updates and FAQs for the latest library services updates. Subject Librarians are available for online appointments, and Virtual Reference has extended hours.

Text Analysis Tools

A companion to our Text and Data Mining Sources infoguide, this guide will take you through how to use several text analysis tools

About Constellate

Constellate, the new text and data analytics service from JSTOR and Portico, is a platform for learning and performing text analysis, building datasets, and sharing text analytics course materials. Constellate provides users with the ability to build datasets for analysis from a variety of sources and also provides a gathering space for a growing community of practitioners. Constellate provides text and data analysis capabilities and access to content from a variety of databases in an open environment with teaching materials that can be used, modified, and shared.

Constellate is in beta and still under development.

Getting Started with Constellate

1. Enter a keyword in the search box on constellate.org or go to constellate.org/builder to begin. Filter your results by keyword, publication title, publication date, language, document type, provider, category, and download availability. You can also look at visualizations of your results. The more results you have, the more visualization options will be available. Once you've filtered the results, click build. Your dataset cannot exceed 25,000 documents because we are using the free tier of Constellate. 

2. Your dataset will take some time to build. You can sign up to receive an email when the dataset is ready, or you can bookmark the page and come back to it. Once your dataset is ready, you are able to analyze or download it. To analyze your dataset, you must agree to their terms and conditions, and then you can select from the following options: introduction; metadata and pre-processing; word frequencies; significant terms; and topic modeling. Each of these analyses are available in tutorial versions, which are heavily-commented, or in research versions, which are minimalist in their commenting. You can also upload your own Jupyter Notebook and code to analyze the dataset. 

3. To download your dataset, you must agree to their terms and conditions, and then you can select from the following options: download a CSV of the dataset; or download the metadata and ngrams in a JSON-L file. 

4. For the term frequency and word cloud visualizations that appear after you build your dataset, you can click the ellipses in the upper right hand corner and select from the following options: save the chart/graph; download a CSV; and share the visualization. 

5. Register and/or login to save your datasets. If you do so, you can revisit your datasets at any time by navigating to constellate.org/dataset/dashboard. There are several pre-built datasets available for you to analyze as well.