Within R a list is a structure that can combine objects of different types. We will learn how to create and work with lists in this section.
A list is actually a vector but it does differ in comparison to the other types of vectors which we have been using in this class.
We first consider a patient database where we want to store their
We then have 3 types of information here:
To create a list of one patient we say
a <- list(name="Angela", owed="75", insurance=TRUE)
a
## $name
## [1] "Angela"
##
## $owed
## [1] "75"
##
## $insurance
## [1] TRUE
With vectors, arrays and matrices we saw that indexing them was very similar with the exception of dimensions. However a list is very different. Notice that unlike a typical vector this prints out in multiple parts. This also allows us to help with indexing as we will see below. There is another easy way to create this same list
Note that below we us double brackets and a character in order to index:
a.alt <- vector(mode="list")
a.alt[["name"]] <- "Angela"
a.alt[["owed"]] <- 75
a.alt[["insurance"]] <- TRUE
a.alt
## $name
## [1] "Angela"
##
## $owed
## [1] 75
##
## $insurance
## [1] TRUE
We could then create a list like this for all of our patients. Our database would then be a list of all of these individual lists.
With vectors, arrays and matrices, there was really only one way to index them. However with lists there are multiple ways:
Below are three different ways in which we can index a list:
a[["name"]]
## [1] "Angela"
a[[1]]
## [1] "Angela"
a$name
## [1] "Angela"
All of the previous are ways to index data in a list. Notice that in two of the above we used double brackets. Next we see the difference between double and single brackets.
a[1]
## $name
## [1] "Angela"
class(a[1])
## [1] "list"
With the single bracket we have a list with the name element only.
a[[1]]
## [1] "Angela"
class(a[[1]])
## [1] "character"
Now with double brackets we actually extract our value out and have a character. So the single bracket returns a list with your indexed object(s) contained in it and the double bracket returns the element with the particular class that represents that element. Depending on your goals you may want to use single or double brackets.
With a list we can always add more information to it.
a$age <- 27
a
## $name
## [1] "Angela"
##
## $owed
## [1] "75"
##
## $insurance
## [1] TRUE
##
## $age
## [1] 27
In order to delete an element from a list we set it to NULL.
a$owed <- NULL
a
## $name
## [1] "Angela"
##
## $insurance
## [1] TRUE
##
## $age
## [1] 27
In order to know what kind of information is included in a list we can look at the names() function
names(a)
## [1] "name" "insurance" "age"
To find the values of things we could go ahead and unlist them
a.un <- unlist(a)
a.un
## name insurance age
## "Angela" "TRUE" "27"
class(a.un)
## [1] "character"
If There is Character data in the original list that unlisted everything will be in character format. If your list contained all numerical elements than the class would be numerical.
Just like arrays and matrices we can use an apply() function. Specifically we have lapply() and sapply() functions for lists. With the original apply() function we could specify whether the function was applied to either the rows or the columns. With the case of lists both functions are applied to elements of the list.
We will create the list n
below:
#Number list
n <- list(1:5, 6:37)
n
## [[1]]
## [1] 1 2 3 4 5
##
## [[2]]
## [1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [24] 29 30 31 32 33 34 35 36 37
lapply(n, median)
## [[1]]
## [1] 3
##
## [[2]]
## [1] 21.5
The lapply() function returns a list with the median of each of the original lists.
sapply(n, median)
## [1] 3.0 21.5
While the sapply() function returns a vector of the medians.
Earlier it was mentioned that a list is a recursive vector. This is because we can actually have lists within lists. For example let us go back to our patient data.
s <- list(name="Chandra", insurance="TRUE", age=36)
patients <- list(a,s)
patients
## [[1]]
## [[1]]$name
## [1] "Angela"
##
## [[1]]$insurance
## [1] TRUE
##
## [[1]]$age
## [1] 27
##
##
## [[2]]
## [[2]]$name
## [1] "Chandra"
##
## [[2]]$insurance
## [1] "TRUE"
##
## [[2]]$age
## [1] 36
It is important to remember how we can call these features of lists. Many of you will want to use R for model building and regressions. You almost never want to use the generated output from R.
For example R does not automatically return the confidence intervals with a regression. The output from most regression functions in R is actually a list. What this means is I can extract the elements from the list that I want in order to build tables that display the exact information that I want it to. This is why we take the time to discuss how to search what is in a list and how to access it.
x <- rnorm(500,10, 3)
y <- 3*x + rnorm(500, 0, 2)
fit <- lm(y~x)
fit
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## -0.3911 3.0224
names(fit)
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
summary <- summary(fit)
summary
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.0785 -1.3921 0.0552 1.4027 6.3620
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.3911 0.3169 -1.234 0.218
## x 3.0224 0.0305 99.082 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.072 on 498 degrees of freedom
## Multiple R-squared: 0.9517, Adjusted R-squared: 0.9516
## F-statistic: 9817 on 1 and 498 DF, p-value: < 2.2e-16
names(summary)
## [1] "call" "terms" "residuals" "coefficients"
## [5] "aliased" "sigma" "df" "r.squared"
## [9] "adj.r.squared" "fstatistic" "cov.unscaled"
set.seed(1234)
x = rnorm(500, 10, 3)
y = runif(1)*x + rnorm(500,0,3.4)
model = lm(y~x)
# 1. Find the summary of model and assign it as summary.
# 2. What does list summary contain?
# 2. Extract the coefficients summary and assign this as coeff.
# 3. Print coeff.
# 4. What is the class of coeff?
# 5. From coeff extract the column of p-values.
# 1. Find the summary of model and assign it as summary.
summary = summary(model)
# 2. What does list summary contain?
names(summary)
# 2. Extract the coefficients summary and assign this as coeff.
coeff = summary$coefficients
# 3. Print coeff.
coeff
# 4. What is the class of coeff?
typeof(coeff)
# 5. From coeff extract the column of p-values.
coeff[,4]
test_error()
test_correct({
test_object("summary")
}, {
test_function("summary")
})
test_function("names")
test_object("coeff")
test_function("typeof")
test_output_contains("coeff[,4]")
success_msg("Great Job")