ggplot2
We will begin our journey into statistical graphics with the package ggplot2
. This is another package by Hadley Wickham and is part of the tidyverse. This means we can use piping
or chaining
to build our graphics. After going through this material, if you would like further information please check out the following books:
ggplot2
do?A good place to start might be with what ggplot2
cannot do. From here we will introduce what it can do.
ggvis
igraph
For this section of the course we will consider the New York City Flights 2013 data. This data contains information on all arriving and departing flights from NYC in 2013. The variables in this dataset are:
As we start with ggplot2
it is important to understand the structure of this. The bas graphics built into R require the use of many different functions and each of them seem to have their own method for how to use them. ggplot2
will be more fluid and the more you learn about it the more amazing of graphics you can create. We will get started with the components of every ggplot2
object:
For example, we will create a simple scatter plot of distance by departure delay:
library(dplyr)
library(ggplot2)
library(nycflights13)
data = flights %>% sample_frac(.01)
ggplot(data, aes(x=distance, y= dep_delay)) +
geom_point()
What the code first does is takes a random 1% sample of all of the flights data. Given that the original data has 336,776 flights, it can be hard to vizualise this much data with any clarity so we will observe a sample for this. We then see that the aesthetic mapping is distance by departure delay. Finally we have a layer of points. This then leads to the following graph:
As we proceed through this section we will begin the graph things in the following pattern: