** Requires a Setup Script, Data file, and access to the designated software. Look for instructions in the Script file explaining how to specify the location of the datafile and "run" the script. For assistance, talk to your local Data Librarian.
Especially if your data file is in a format different from your software, there can be problems when you first open it. Check for these common issues.
V1 or X. Specify names during import, or rename afterwards.$, %, &, or .) may be imported as strings, or may cause the entire value to be set as missing.Unless you know a variable is continuous or open-ended text, generate a frequency table or a bar chart. Note that the question label is not sufficient to know (e.g., age could be measured in any number of ways). The below is what to look for:
Is a frequency table appropriate? What do you see in the possible values overall?
When I have a consultation with someone, this is the information I want to know. Each greatly affects the future of the project and what needs to be done. So, it is important that you consider these issues ahead of time yourself.
What are the characteristics of the data file that will be needed to answer the question(s) you have, given the complexity of the analysis you plan to use?
Depending on requirements related to your field and the type of project you are completing (e.g., dissertation), analyses could include: a) Descriptive Statistics, b) Bivariate Inferential Statistics, c) Standard multivariate analyses such as ANOVA and Multiple Regression (Linear or Logistic), or d) other Modeling (including GLM, Mixed Models, or SEM). This will influence how much data is needed, necessary software, and the criteria the data must meet.
It is useful to be able to conceptualize the ultimate data table you will need and even create it with some example data. The research questions should define a) the unit of analysis, b) the population (observations), and the constructs (variables).
If you do not have research questions, that usually means you are expected to come up with a research question you can answer with the data you have. If so, consider which of the topics addressed by the data interest you. You will want to choose two topics and look for relationships between the values or group differences
Is the data you have capable of being the data you need?
Is the unit of observation a person, a place, or something else? Is it the same as the unit of analysis? Find the variable(s) that contain unique identifiers and determine their meaning. If you have multiple data tables/files, is the unit of observation the same and how will the rows match up--you can aggregate your data to make the units bigger, but not the opposite. Are there repeated measures, multi-level, time series, or panel data? If so, you might need to reshape the data or use special analyses.
How much of the data is actually from your target population? Both primary and secondary data may have missing data or extra observations that are not relevant, but it is important to end up with a sufficient number of observations. Also, what type of sampling method was used, if any? Were snowball techniques used, or complex sampling methods used to select the samples? Do you even want to generalize from your sample to the population, and how much work will it be if so?
What data importing and cleaning steps will you need to do to even get started? Do you have enough time and knowledge to work with your data?
The data source gives a good idea of the data quality and how much time and effort will be needed to process it. Large, reputable, data collectors will have clean data with lots of documentation. Smaller sources might need more checking for data errors and may not have all the information necessary. If you have collected you own data, you may not need to narrow down the variables needed, but there are various cleaning steps you will need to take depending on the mode of collection (e.g., mail vs online), population (e.g., paid vs volunteer), and other factors.
Is it on paper, in Excel, a file in SPSS format, or what? Is there just one file/table, or multiple? The data format affects how quickly you will be able to use the data in your statistical software. Recent versions of all major software packages can directly open files from each other (and Qualtrics can export to SPSS format). But conversions always require checking. Older or smaller data may even come in a text format like CSV or fixed format that are more difficult to open and do not already have labels.
These are also great sources to search for others who have the same issues. Definitely read some questions and answers before posting yourself to understand what details you need to include. It takes time to ask a good question that will receive a good answer.
Ask a Librarian | Hours & Directions | Mason Libraries Home
Copyright © George Mason University