I’m pleased to announce tibble, a new package for manipulating and printing data frames in R. Tibbles are a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. The name comes from dplyr: originally you created these objects with tbl_df()
, which was most easily pronounced as “tibble diff”.
Another R cheat sheet I found useful. Selecting attributes from a data frame # data row, attributename iris 1, 'Species' # 1 setosa # Levels: setosa versicolor virginica # approach 1: use form of extract operator to extract a column iris'Species'%% head # 1 setosa setosa setosa setosa setosa setosa # Levels: setosa.
Dplyr::groupby(iris, Species) Group data into rows with the same value of Species. Dplyr::ungroup(iris) Remove grouping information from data frame. Column names and types are determined from the data in the sheet, by default. User can also supply via colnames and coltypes and control name repair via.namerepair. Returns a tibble, i.e. A data frame with an additional tbldf class. Among other things, this provide nicer printing. JOIN (TO) LISTS append(x, values, a!er = length(x)) Add to end of list. Append(x, list(d = 1)) prepend(x, values, before = 1) Add to start of list. Prepend(x, list(d = 1)).
Install tibble with:
This package extracts out the tbl_df
class associated functions from dplyr. Kirill Müller extracted the code from dplyr, enhanced the tests, and added a few minor improvements.
Creating tibbles
You can create a tibble from an existing object with as_data_frame()
:
This works for data frames, lists, matrices, and tables.
You can also create a new tibble from individual vectors with data_frame()
:
data_frame()
does much less than data.frame()
: it never changes the type of the inputs (e.g. it never converts strings to factors!), it never changes the names of variables, and it never creates row.names()
. You can read more about these features in the vignette, vignette('tibble')
.
You can define a tibble row-by-row with frame_data()
:
Tibbles vs data frames
There are two main differences in the usage of a data frame vs a tibble: printing, and subsetting.
Tibbles have a refined print method that shows only the first 10 rows, and all the columns that fit on screen. This makes it much easier to work with large data. In addition to its name, each column reports its type, a nice feature borrowed from str()
:
Tibbles are strict about subsetting. If you try to access a variable that does not exist, you’ll get an error:
Tibbles also clearly delineate [
and [[
: [
always returns another tibble, [[
always returns a vector. No more drop = FALSE
!
Interacting with legacy code
A handful of functions are don’t work with tibbles because they expect df[, 1]
to return a vector, not a data frame. If you encounter one of these functions, use as.data.frame()
to turn a tibble back to a data frame:
The goal of this appendix is to give a easy reference to basic manipulation functions that are often used and should always be readily accessible.
Importing Data and Loading Packages
Function | Meaning |
---|---|
data('DataName', package='PackageName') | Load the data set DataName which is found in the package PackageName |
library(PackageName) | Load the package PackageName to be used. |
read.csv('filename.csv') | Read a .csv file. This result needs to be saved or else it is just printed |
Useful vectorized functions
Tibble Cheat Sheet 2020
Function | Meaning |
---|---|
ifelse( logicalTest, TrueResult, FalseResult ) | Creates a vector of output, where elements are either the TrueResult or FalseResult based on the corresponding outcome in the logicalTest vector |
Data frame (tibble) manipulation
In the examples below, df
stands for an arbitrary data frame that we are applying the functions to.
Tibble Cheat Sheet 2019
Function | Meaning |
---|---|
data.frame(x= , y=.) | Creates a data frame “by hand” with one column per input. |
tibble(x= , y= ) | Creates a tibble “by hand” with one column per input. |
tribble( ~x, ~y, 1, 2) | Creates a tibble “by hand,” but with row-wise specification. |
df %>% add_row(x=3, y=5) | Add a single row to the df data frame. Any column with unspecified data is filled with NA . |
df1 %>% bind_rows(df2) | Stack data frames df1 and df2 |
df %>% select(ColumnNames) | Subset df and return a data frame with the columns specified. |
df %>% filter(logicalTest) | Subset df and return a data frame with the rows that satisfy the logical expression |
df %>% mutate( New= ) | Create (or update) a column New with some manipulation of Old column. A common manipulation is to use an ifelse() command to update only particular rows. |
Data frame (tibble) reshaping
Tibble Cheat Sheet
These functions will modify an input data frame df
Tibble Cheat Sheets
Function | Meaning |
---|---|
group_by(df, Column1, Column2) | Create a grouped tibble with groups defined by all unique combinations of Column1 and Column2 |
summarize(df, Function(Column1)) | Apply Function to Column1 and return a data frame with just a single row. This is quite powerful when applied to a grouped tibble as it will result in a single row per group. |
df %>% pivot_wider(names_from=, values_from=) | Create a wide data set from a long format |
df %>% pivot_longer(names_to, values_to) | Create a long data set from a wide format |