There is usually no way around needing a new variable in your data. For example, most medical studies have height and weight in them, however many times what a researcher is interested in using is Body Mass Index (BMI). We would need to add BMI in.
Using the tidyverse we can add new variables in multiple ways
mutate()transmute()With mutate() we have
mutate(.data, ...)where
.data is your tibble of interest.... is the name paired with an expressionThen with transmute() we have:
transmute(.data, ...)where
.data is your tibble of interest.... is the name paired with an expressionmutate() and transmute()There is only one major difference between mutate() and transmutate and that is what it keeps in your data.
mutate()
transmute()Let’s say we wish to have a variable called speed. We want to basically do:
\[\text{speed} = \dfrac{\text{distance}}{\text{time}}*60\]
We can first do this with mutate():
flights %>% 
  select(flight, distance, air_time) %>%
  mutate(speed = distance/air_time*60)## # A tibble: 336,776 × 4
##    flight distance air_time    speed
##     <int>    <dbl>    <dbl>    <dbl>
## 1    1545     1400      227 370.0441
## 2    1714     1416      227 374.2731
## 3    1141     1089      160 408.3750
## 4     725     1576      183 516.7213
## 5     461      762      116 394.1379
## 6    1696      719      150 287.6000
## 7     507     1065      158 404.4304
## 8    5708      229       53 259.2453
## 9      79      944      140 404.5714
## 10    301      733      138 318.6957
## # ... with 336,766 more rowsNotice with mutate() we kept all of the variables we selected and added speed to this. Now we can do the same with transmute():
flights %>%
  select(flight, distance, air_time) %>%
  transmute(speed = distance/air_time*60)## # A tibble: 336,776 × 1
##       speed
##       <dbl>
## 1  370.0441
## 2  374.2731
## 3  408.3750
## 4  516.7213
## 5  394.1379
## 6  287.6000
## 7  404.4304
## 8  259.2453
## 9  404.5714
## 10 318.6957
## # ... with 336,766 more rowsIn this example we have only kept speed.