Although it is not required to use just one of these, it is best to choose one and use it consistently. However, it is useful to be able to read Base R code, as it may be used in tutorials, is necessary for certain packages, and can simplify code in some circumstances.
If you really can't decide, check out this comparison for examples of each.
| Scheme | Data Class | Template | Advantages |
|---|---|---|---|
| Base R | data.frame | mydata[rows,columns] | universal, historical |
| Data Table | data.table | mydata[rows,columns,by] | fastest for Big Data |
| Tidyverse | tibble | mydata |> filter(rows) |> select(columns) | easiest to use |
We recommend the Tidyverse for most people who are doing data analysis from the social and health sciences. Data Table can be better for Data Scientists and programmers or those who work with very large data files. See Learn to work with large dataset in R by Analytics Vidhya
Few people use Base R exclusively, but it can be useful to know some notation such as $ and [] referring to object parts (see foundations), especially for unique packages that do not support tidyverse notation.
What is tidy data? See the classic article: Tidy Data (pdf) by Hadley Wickham, uses an older version of tidyr
|> filter(rows) |> select(columns)|> is now built into R, but the tidyverse originally used %>% (from magrittr)filter() - Keep/Remove Rows/Observationsselect() - Keep/Remove Variablesmutate() - Create or change variables
group_by() |> summarize() - Summarize/Collapse across rowsAsk a Librarian | Hours & Directions | Mason Libraries Home
Copyright © George Mason University