Skip to Main Content
George Mason University | University Libraries
See Updates and FAQs for the latest library services updates. Subject Librarians are available for online appointments, and Virtual Reference has extended hours.

QUANTitative Analysis & Statistics

This is a "starting point" guide for those taking a statistics and [quantitative] data analysis courses and/or doing a data analysis project.

Get Data Files

Data will often come in one or more files, whether from another researcher or from data-collection software. Many sources and software allow you to choose a file type. All statistical software should be able to import SPSS and SAS files. 

What Format to Choose

Obtain your data in the first one of these available. The advantage of data in statistical software format is that it will be labeled and have other metadata. All major packages can import files from the others.

Look for export options to confirm. For example,
Qualtrics
can export in SPSS (.sav) format. And, you can choose not to .zip the file if it is smaller.

If the file comes with a .zip or .tar.gz extension, indicating the file is in a compressed folder, you will need to uncompress the folder first using your operating system or specialized software like 7-zip.

  1. Your statistical software -- Reduces possible errors
  2. Any statistical software
    • SPSS (.sav), Stata (.dta), SAS (.sas7bdat), R(.rdata), Jamovi (.omv), etc.
  3. Fixed Format with a Setup File **
    • Setup Script: SPSS (.sps), Stata (.do and .dct), SAS (.sas), R(.R)
    • Data File: Text or ASCII file (.dat or .txt) -- will not have variable names at the top
  4. Spreadsheet format
    • "Delimited" files: Comma Separated Values (.csv) or Tab Separated Values (.tsv or .tab)
    • Excel or the equivalent (e.g., .xls, .xlsx)
    • May be labeled ASCII or given as .dat or .txt, (but without a Setup Script file as in #3)
  5. Other formats:
    • Structured Text:  JSON (.json), XML (.xml)
    • Many more!

** Requires a Setup Script, Data file, and access to the designated software. Look for instructions in the Script file explaining how to specify the location of the datafile and "run" the script. For assistance, talk to your local Data Librarian.

Common Import Problems

Common Import Problems

For delimited (e.g., CSV) or other text files:
  • Variables not separated properly into columns. This may only affect a few observations, but leads to obviously incorrect values in some variables. One observation might also be split across multiple rows. Check frequency tables for excess missing values or unexplainable values. In most cases, you will have to re-import with the proper settings to fix this.
  • Observations might disappear due to file corruption, or extra rows (with mostly missing values) could be included due to headers, notes, or summaries. Confirm that the number of observations matches expectations. If there are extra rows, they can be deleted. If the file is corrupted, you may have to re-obtain it.
All files
  • String fields may get truncated (lose text) if the field size is not guessed correctly by the software, or look garbled if the wrong decoding is used (e.g., ASCII vs UTF-8). Check fields that may have lots of text (e.g., open-ended answers) for complete content. You will have to re-import, specifying the correct settings.
  • Variable Names may be changed to be valid. Spaces will be removed or converted to underscores. Alternatively, the name could be changed to something generic, like V1 or X. Specify names during import, or rename afterwards.
  • Special Field Types
    • Date fields may not be parsed or interpreted correctly. Check missing values and spot-check the year against the original source.
    • Numeric fields with non-number characters (including $, %, &, or .) may be imported as strings, or may cause the entire value to be set as missing.
    • If necessary, import these fields as strings to start and use the appropriate software functions to properly format the data at a later point if needed. Set up primary data properly to avoid these issues.