These software offer a wide range of data tools, including data cleaning, summary statistics, with many also providing statistical analyses and visualizations.
Spreadsheet Software
- Spreadsheets (Microsoft Excel, Google Sheets, OpenOffice Calc)
- These general spreadsheet softwares are flexible and familiar, and while it may be able to do other functions, they are not the best nor do they encourage best practices. Do use Pivot Tables and/or PowerQuery (Excel) to help maintain data integrity and replicability.
- Power BI (Microsoft)
- Free to all for all functionality except sharing dashboards, this offshoot of Microsoft Excel made for tidy tabular data combines the data cleaning ability of PowerQuery with more appropriate visualizations for academia and business than Pivot Charts.
- OpenRefine
- Originally created by Google, this powerful free and open-source data cleaning software runs in a local browser window (i.e., data stays on your own computer). With faceting, clustering, and an underlying language offering flexible replicability, this beats PowerQuery for exploring and fixing particularly messy data. Notably absent is the ability to merge (add variables), but appending is seamless.
- Tableau
- Free for academic projects and teaching, this otherwise expensive business software focuses on providing beautiful and interactive visualizations and dashboards. With the newer Prep software, it can handle additional data cleaning tasks.
Programming Languages
R vs Python (Medium)
- R / RStudio
- This statistical language can be easier for non-programmers to learn than Python and is the best choice if you are almost always working with tabular data (Python tends to be better for non-tabular data, but either can do the basics).
- Python
- This full-fledged programming language can ultimately do anything with any type of data and is far easier than other languages. But, it can be finicky and has a steeper learning curve than the other tools.