Skip to Main Content
George Mason University InfoGuides

Software: Learn R

Resources to learn and use the Open Source Statistical software R (R-Project)

Data Management in R - The 3 Options

Although it is not required to use just one of these, it is best to choose one and use it consistently. However, it is useful to be able to read Base R code, as it may be used in tutorials, is necessary for certain packages, and can simplify code in some circumstances.

If you really can't decide, check out this comparison for examples of each.

Scheme Data Class Template Advantages
Base R data.frame mydata[rows,columns] universal, historical
Data Table data.table mydata[rows,columns,by] fastest for Big Data
Tidyverse tibble mydata |> filter(rows) |> select(columns) easiest to use

We recommend the Tidyverse for most people who are doing data analysis from the social and health sciences. Data Table can be better for Data Scientists and programmers or those who work with very large data files. See Learn to work with large dataset in R by Analytics Vidhya

Few people use Base R exclusively, but it can be useful to know some notation such as $ and [] referring to object parts (see foundations), especially for unique packages that do not support tidyverse notation. 

What is tidy data? See the classic article: Tidy Data (pdf) by Hadley Wickham, uses an older version of tidyr

Tidyverse Tutorials & Documentation

Tutorials

Tidyverse Terms & Symbols

  • Most popular, our recommended choice. Written by Hadley Wickham (RStudio/Posit)
  • Uses words instead of notation and built to support piping.
    • mydata |> filter(rows) |> select(columns)
    • |> is now built into R, but the tidyverse originally used %>% (from magrittr)
  • dplyr is for manipulating one dataset
    • filter() - Keep/Remove Rows/Observations
    • select() - Keep/Remove Variables
    • mutate() - Create or change variables
      • See the packages forcats, stringr, lubridate, hms and more for working with specific data types
    • group_by() |> summarize() - Summarize/Collapse across rows
    • also arrange(), relocate(), rename(), count(), and many more
  • tidyr is for combining datasets or restructuring a dataset
  • readr, haven, and many others are for importing data from other formats
  • purrr provides Tidyverse alternatives to the base R apply() series of functions