Skip to Main Content
George Mason University | University Libraries
See Updates and FAQs for the latest library services updates. Subject Librarians are available for online appointments, and Virtual Reference has extended hours.

Text Analysis Tools

A companion to our Text and Data Mining Sources infoguide, this guide will take you through how to use several text analysis tools

OpenRefine

OpenRefine is a tool used to clean messy data, transform data from one format to another, and extend data. Features of this tool include faceting, clustering, reconciling your dataset to external databases, infinite undo/redo, and the ability to contribute to Wikidata. OpenRefine keeps data private on your machine until you want to share it.

Getting Started with OpenRefine

1. Download OpenRefine from their website. OpenRefine runs as a small web server on your own computer and you point your web browser at the web server in order to use OpenRefine. It works best on Chrome, Chromium, Opera, Microsoft Edge, and Safari. For more information on this, see their installation instructions.

2. Once OpenRefine is running in one of your browsers, create a project by importing data. OpenRefine supports the following file formats: CSV, TSV, text files, fixed-width columns, JSON, XML, ODS, XLS, XLSX, PX, MARC, RDF data, and wikitext. You can import data from your computer, a URL, clipboard, database, or Google Data.

3. Your data will load. You can parse your data, create a project name, and add tags. Click create project when done.

4. You are able to learn more about your data using facets, filters, and sorting. You can transform your data through common and custom transformations, clustering, pulling data from the web, reconciling, and writing expressions.

5. Export your data by clicking on the export button in the upper right hand corner. You are able to export your data in multiple formats: TSV, CSV, HTML table, XLS, XLSX, ODS, upload to Google sheets, custom tabular exporter, SQL statement exporter, and templating exporter.