Previously we have worked with data in the form of
“Tibbles” are a new modern data frame. It keeps many important features of the original data frame. It removes many of the outdated features. They are another amazing feature added to R by Hadley Wickham. We will use them in the tidyverse to replace the older outdated dataframe that we just learned about.
## Warning: package 'tidyverse' was built under R version 3.3.2
## Warning: package 'ggplot2' was built under R version 3.3.2
## Warning: package 'tidyr' was built under R version 3.3.2
try <- tibble(x = 1:3, y = list(1:5, 1:10, 1:20))
## # A tibble: 3 × 2
## x y
## <int> <list>
## 1 1 <int [5]>
## 2 2 <int [10]>
## 3 3 <int [20]>
We can see that y
is displayed as a list. If we try to do this with a traditional data frame we get:
try <- as_data_frame(c(x = 1:3, y = list(1:5, 1:10, 1:20)))
Error: Variables must be length 1 or 20. Problem variables: 'y1', 'y2'
We can use a non standard name in our Tibble as well:
names(data.frame(`crazy name` = 1))
## [1] ""
names(tibble(`crazy name` = 1))
## [1] "crazy name"
Notice that the dataframe replaced the name that we wanted because it could not handle a space being in the name.
A tibble can be made by coercing as_tibble()
. This works similar to
. It is a very efficient process though.
l <- replicate(26, sample(100), simplify = FALSE)
names(l) <- letters
## Unit: microseconds
## expr min lq mean median uq max
## as_tibble(l) 309.250 327.099 376.2002 344.7265 386.004 1689.046
## 1390.507 1464.361 1614.3087 1543.3465 1690.608 3104.097
## neval cld
## 100 a
## 100 b
Microbenchmarking is a way to calculate the average times spent on an object. You can see how much faster it is to create a tibble than a dataframe. This will make a large difference in a data analysis.
There are a couple key differences between tibbles and data frames.
a = lubridate::now() + runif(1e3) * 86400,
b = lubridate::today() + runif(1e3) * 30,
c = 1:1e3,
d = runif(1e3),
e = sample(letters, 1e3, replace = TRUE)
## # A tibble: 1,000 × 5
## a b c d e
## <dttm> <date> <int> <dbl> <chr>
## 1 2017-02-19 09:02:23 2017-03-09 1 0.02150370 f
## 2 2017-02-19 01:42:10 2017-03-09 2 0.08031493 k
## 3 2017-02-19 05:36:59 2017-03-08 3 0.11670172 u
## 4 2017-02-19 18:49:56 2017-03-09 4 0.24552337 h
## 5 2017-02-19 04:15:06 2017-03-05 5 0.11232662 b
## 6 2017-02-19 10:00:27 2017-03-09 6 0.52834632 m
## 7 2017-02-19 13:42:43 2017-03-16 7 0.78928491 v
## 8 2017-02-19 17:02:27 2017-03-16 8 0.80388276 h
## 9 2017-02-19 15:09:33 2017-03-19 9 0.45767339 d
## 10 2017-02-19 09:14:04 2017-02-25 10 0.18177950 t
## # ... with 990 more rows
which we will learn about later.
df %>% .$x
df %>% .[["x"]]
df <- tibble(
x = runif(5),
y = rnorm(5)
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
The above commands should seem very familiar after the previous work but wit the piping
or chaining
we can do the same:
df %>% .$x
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
df %>% .[["x"]]
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486
df %>% .[[1]]
## [1] 0.6227033 0.7363213 0.8551199 0.9173554 0.5542486