Adding New Variables

There is usually no way around needing a new variable in your data. For example, most medical studies have height and weight in them, however many times what a researcher is interested in using is Body Mass Index (BMI). We would need to add BMI in.

Using the tidyverse we can add new variables in multiple ways

  • mutate()
  • transmute()

With mutate() we have

mutate(.data, ...)

where

  • .data is your tibble of interest.
  • ... is the name paired with an expression

Then with transmute() we have:

transmute(.data, ...)

where

  • .data is your tibble of interest.
  • ... is the name paired with an expression

Differences Between mutate() and transmute()

There is only one major difference between mutate() and transmutate and that is what it keeps in your data.

  • mutate()
    • creates a new variable
    • It keeps all existing variables
  • transmute()
  • creates a new variable.
  • It only keeps the new variables

Example

Let’s say we wish to have a variable called speed. We want to basically do:

\[\text{speed} = \dfrac{\text{distance}}{\text{time}}*60\]

We can first do this with mutate():

flights %>% 
  select(flight, distance, air_time) %>%
  mutate(speed = distance/air_time*60)
## # A tibble: 336,776 × 4
##    flight distance air_time    speed
##     <int>    <dbl>    <dbl>    <dbl>
## 1    1545     1400      227 370.0441
## 2    1714     1416      227 374.2731
## 3    1141     1089      160 408.3750
## 4     725     1576      183 516.7213
## 5     461      762      116 394.1379
## 6    1696      719      150 287.6000
## 7     507     1065      158 404.4304
## 8    5708      229       53 259.2453
## 9      79      944      140 404.5714
## 10    301      733      138 318.6957
## # ... with 336,766 more rows

Notice with mutate() we kept all of the variables we selected and added speed to this. Now we can do the same with transmute():

flights %>%
  select(flight, distance, air_time) %>%
  transmute(speed = distance/air_time*60)
## # A tibble: 336,776 × 1
##       speed
##       <dbl>
## 1  370.0441
## 2  374.2731
## 3  408.3750
## 4  516.7213
## 5  394.1379
## 6  287.6000
## 7  404.4304
## 8  259.2453
## 9  404.5714
## 10 318.6957
## # ... with 336,766 more rows

In this example we have only kept speed.


Previous section:
Next section: