Extended Example and Using ggplot2

We will use an example created by the Institute for Quantitative Social Science at Harvard University. The goal will be to create a copy of a very nice graph from the economist:

image

image


Make sure you can Run this Code

require(ggplot2)
require(ggrepel)
dat <- read.csv("https://drive.google.com/uc?export=download&id=0B8CsRLdwqzbzUDJLa1owSVduLTA")

  • We will begin by using the ggplot2 package to create this graph.
  • We will start with a basic plot of CPI (Corruption Index) vs HDI (Human Development Index):
# Calls data then the asthetics are x,y and how coloring works
pc1 <- ggplot(dat, aes(x=CPI, y=HDI, color=Region))
#geom_point() is how we choose points to be plotted
pc1 + geom_point()


  • This is a long way off from our desired graph. We still need:
    • add a trend line
    • change the point shape to open circle
    • label select points
    • change the order and labels of Region
    • title, label axes, remove legend title
    • fix up the tick marks and labels
    • move color legend to the top
    • theme the graph with no vertical guides
    • add model R2 (hard)
    • add sources note (hard)
    • final touches to make it perfect (use image editor for this)

Trend Lines

  • This is easy to do if you know the trend line in which you wish to fit.
  • This time we go with a function \(HDI = log(CDI)\).
(pc2 <- pc1 +
   geom_smooth(aes(group = 1),
               method = "lm",
               formula = y ~ log(x),
               se = FALSE,
               color = "red")) +
   geom_point()
  • In this the geom_smooth adds the line to the graph, then on top of that geom_point() adds the points.


Change Point Shape

  • We begin this by exploring the point options in R, ?shape.
  • The following is an example that the shape page gives us:
## A look at all 25 symbols
df2 <- data.frame(x = 1:5 , y = 1:25, z = 1:25)
s <- ggplot(df2, aes(x = x, y = y)) + 
  geom_point(aes(shape = z), size = 4) + scale_shape_identity() +
  geom_point(aes(shape = z), size = 4, colour = "Red") +
  scale_shape_identity() + 
  geom_point(aes(shape = z), size = 4, colour = "Red", fill = "Black") +
  scale_shape_identity()


  • We then desire to have shape 1 for our graph:
pc2 + geom_point(shape=1, size=4)


  • This looks good but the original has thicker circles so we can play with this a little more to get
(pc3 <- pc2 +
   geom_point(size = 4.2, shape = 1) +
   geom_point(size = 4.3, shape = 1) +
    geom_point(size = 4.1, shape = 1) +
   geom_point(size = 4, shape = 1) +
   geom_point(size = 3.9, shape = 1) +
   geom_point(size = 3.8, shape = 1)+
   geom_point(size = 3.5, shape = 1))


Label Select Points

  • With these labels there is no specific mathematical way to see why they were labeled. -This means we have to find a way to specifically label the ones we care about.
  • We then can use geom_text() to add labels to values.
pointsToLabel <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan",
                   "Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
                   "India", "Italy", "China", "South Africa", "Spane",
                   "Botswana", "Cape Verde", "Bhutan", "Rwanda", "France",
                   "United States", "Germany", "Britain", "Barbados", "Norway", "Japan",
                   "New Zealand", "Singapore")

Label Select Points

 pc3 +
  geom_text(aes(label = Country),
            color = "gray20",
            data = subset(dat, Country %in% pointsToLabel))


  • We notice that our text lays on top of the points here and it is hard to distinguish what point goes with which label.
  • We can solve this by using the ggrepel package:
library("ggrepel")
(pc4 <-pc3 +
  geom_text_repel(aes(label = Country),
            color = "gray20",
            data = subset(dat, Country %in% pointsToLabel),
            force = 10))


Change Region Labels and Order.

  • We can do this by adjusting the factor levels and labels.
dat$Region <- factor(dat$Region,
                     levels = c("EU W. Europe",
                                "Americas",
                                "Asia Pacific",
                                "East EU Cemt Asia",
                                "MENA",
                                "SSA"),
                     labels = c("OECD",
                                "Americas",
                                "Asia &\nOceania",
                                "Central &\nEastern Europe",
                                "Middle East &\nNorth Africa",
                                "Sub-Saharan\nAfrica"))

Change Region Labels and Order.

  • We then need to tell the graph that we have called pc4 above to use the new data that we have adjusted.
pc4$data <- dat
 pc4


Add title and format axes.

  • We can adjust the axes by using scales in ggplot2.
  • We can add a title to a plot using ggtitle().
library(grid)
pc5 <- pc4 +
  scale_x_continuous(name = "Corruption Perceptions Index, 2011 (10=least corrupt)",
                     limits = c(.9, 10.5),
                     breaks = 1:10) +
  scale_y_continuous(name = "Human Development Index, 2011 (1=Best)",
                     limits = c(0.2, 1.0),
                     breaks = seq(0.2, 1.0, by = 0.1)) +
  scale_color_manual(name = "",
                     values = c("#24576D",
                                "#099DD7",
                                "#28AADC",
                                "#248E84",
                                "#F2583F",
                                "#96503F")) +
  ggtitle("Corruption and Human development")


Final Changes to the output.

pc6 <- pc5 +
  theme_minimal() + # start with a minimal theme and add what we need
  theme(text = element_text(color = "gray20"),
        legend.position = c("top"), # position the legend in the upper left 
        legend.direction = "horizontal",
        legend.justification = 0.1, # anchor point for legend.position.
        legend.text = element_text(size = 11, color = "gray10"),
        axis.text = element_text(face = "italic"),
        axis.title.x = element_text(vjust=-10), # move title away from axis
        axis.title.y = element_text(vjust = 2), # move away for axis
        axis.ticks.y = element_blank(), # element_blank() is how we remove elements
        axis.line = element_line(color = "gray40", size = 0.5),
        axis.line.y = element_blank(),
        panel.grid.major = element_line(color = "gray50", size = 0.5),
        panel.grid.major.x = element_blank()
        )


Add the \(R^2\) to the plot.

mR2 <- summary(lm(HDI ~ log(CPI), data = dat))$r.squared
library(grid)
#png(file = "images/econScatter10.png", width = 800, height = 600)
pc6 
grid.text("Sources: Transparency International; UN Human Development Report",
         x = .01, y = .01,
         just = "left",
         draw = TRUE,  gp=gpar(fontsize=7, col="grey"))
grid.segments(x0 = 0.81, x1 = 0.825,
              y0 = 0.90, y1 = 0.90,
              gp = gpar(col = "red"),
              draw = TRUE)
grid.text(paste0("R? = ",
                 as.integer(mR2*100),
                 "%"),
          x = 0.835, y = 0.90,
          gp = gpar(col = "gray20"),
          draw = TRUE,
          just = "left")

#dev.off()


Previous section: