Skip to Main Content
George Mason University InfoGuides

Find Data & Statistics: Datasets for Practice

Looking for datasets to practice your skills, complete a class project, or teach statistics?
This page highlights free, easy-to-use datasets for learning, exploration, and experimentation.

Good Places to Start

  • Kaggle Datasets - (registration required) user-contributed open data with preview or Competition Data
  • Google Dataset Search - Google search optimized for datasets
  • Generate Data.com - 30+ data types, 10+ data formats, even latitude/longitude and credit card data
  • See also all the other tabs in this guide that are not specifically for Statistics; much is freely available

More Suggested Datasets

Data with Projects

Competitions / Social Challenges

  • Tidy Tuesday - Weekly social data tidying and visualizing group (R-based) with archive of picks
  • Community Projects - Links from Tableau to a variety of data-visualization-focused challenges.
  • ML Contests - Aggregator of Data Science and Machine Learning competitions, from Kaggle, drivendata, and others.

Repositories with Examples

Smaller Data

Designed for Teaching

Provided for Replication or Reuse

  • fivethirtyeight - Datasets on politics and sports used in their articles
  • BuzzFeedNews - Datasets from investigative journalism projects
  • RDatasets - Repository for datasets distributed with R and various R packages for use in examples

Bigger Data

Access larger, more complex datasets ideal for machine learning, coding practice, and advanced analysis.

Repositories

  • AWS Data Exchange - Open Data - Many large datasets on a variety of topics (see also the Registry)
  • openBIGdata.org (BERD@NFDI) - A curated directory of open, large-scale datasets for research in business, economics, social sciences, and more.Examples: job postings, historical advertisements, sentiment data, and e-commerce behavior.
  • OpenML - Intended to provide training data for machine learning, with good metadata

Datasets