Software Carpentry ICS proposal
These exercices were prepared by the participants in the R Consortium-funded Software Carpentry instructor training. For background and more details on the methods illustrated here, see the Instructor Training curriculum, in particular sections Novices and Formative Assessment for formative assessment and Cognitive Load for faded examples.
Multiple choice questions are a form of formative assessment taking place during the teaching and learning that inform both the instructor and the students what to focus on. The (wrong) answers are not picked at random, but are designed to highlight specific points of misunderstanding that will need to be re-explained if students incorrectly choose that answer.
Assume the following function definition:
display <- function(a = 1, b = 2, c = 3) {
result <- c(a, b, c)
names(result) <- c("a", "b", "c") # This names each element of the vector
return(result)
}
What would the result be of the following function call (please note, that is is considered bad form to combine named and positional arguments this way):
display(c = 77, 5)
a) (correct)
a b c
5 2 77
b) (values are assigned in order disregarding the names)
a b c
77 5 3
c) (named parameter is used correctly, then wrongly assumed that the second argument will be passes to the second parameter)
a b c
1 5 77
d) (thinking that you cannot add positional arguments after named arguments)
a b c
1 2 77
=
- a right answer<
- a wrong but almost plausible answer==
- Learner is familiar with some commands in R, but has confused logical test with assignment operatoraverage(c(1,2,4))
2
(median: possibly correct but not specified)7/3
(average is a generic term often assumed to be the mean)Make a prediction without running it:
NA == NA
TRUE
- The terms are equal because they are both NA (incorrect)FALSE
- The terms are unequal because R considers each NA to be unique (incorrect)NA
- R cannot tell whether the missing values represented by the NAs are equal or unequal (correct)With the data frame, cats below, we run the command:
rbind(cats, c('tabby', '4.0', TRUE))
What happens to the weight column?
coat weight likes_string
1 calico 2.1 TRUE
2 black 5.0 FALSE
3 tabby 3.2 TRUE
Answers
4.0
to a numeric typeIn unix shell if you are currently located in the folder
/home/project/experiment_1/run_1
how would you navigate to the
folder r/home/project
:
a) cd
b) cd ../../
c) cd ../
You have been working on your code, in particular a file called
analysis.r
. After a while, you look at your repository’s state using
the git status
command. In the section “Changes not staged for
commit”, it says modified: analysis.r
. What is the next step you’ll
want to do?
git commit
to save changes (missed move to staging area)git add
to make git aware of your changes (correct)Consider a data.frame
object in R, named x
. How would you access
the values from the second column, in the 3rd and 5th rows:
x[c(3, 5), 2]
x[2, c(3, 5)]
– switching rows and columnsx[c(3, 5), ]
– forget to specify columnx[(3, 5), 2]
– forget to specify rows as a vector using c
functionWhat will running the following R code return:
x <- 3
get_x <- function(y ){
return(x)
}
get_x(4)
3
(correct - it will find x
defined in the global environment)4
(guessing from the value y passed to the function)What class of object does the following R command create?
x <- sum(c(NA, 3.6, 5, TRUE))
Students are presented witha figure showing a Poisson distribution.
Q: What does this figure tell us about these data?
What does the r-squared coefficient represent?
For a dataframe my.df
with seven rows and four columns, which of the
following statements is true?
length(rbind(my.df, my.df)) == 8
- switching rows and columnsnrow(my.df) == length(my.df)
- df as list of column vectorsncol(my.df) == length(my.df)
- correctdim(rbind(my.df, my.df)) == dim(cbind(my.df, my.df))
- very closeGiven the dataframe cats of cats:
coat weight likes_to_eat
1 calico 2.1 FISH
2 black 5.0 FISH
3 pink 3.2 COW
4 green 6.6 COW
5 pink 9.0 FISH
6 tabby 6.8 MILK
What is the command to select a subset of the dataframe on what pink cats like to eat?
cats[c(3,5),]
OK but not applicable for larger dataframes (you
have to know all the row numbers for your selector)cats[which(cats$coat=="pink"),]
OK but selecting all the columns
(indicating potential problem with selecting rows/columns?)cats["pink",]
incorrect selector for rows (indicating potential
problem with selectors)cats[cats$coat=="pink","likes_to_eat"]
best answer, most specific
and succint[4]
Q: What is the correct answer for x + y
when:
x <- c(1, 2, 3, 4)
y <- c(1, 2)
2 4 4 6
(correct)2 3 4 5
(added 1 to positions 3 and 4 in x)2 4 3 4
(forgot to recycle)2 4
(did not recycle the shorter vector to the longer)Using the download.file
function example:
download.file("https://ndownloader.figshare.com/files/2292169", "data/portal_data_joined.csv")
In which directory might we find the file we have saved?
data
(Correct)R
R/Data
~user
Say you want to exclude unhappy entries from the dataframe below.
age <- c(1, 4, 10)
color <- c("red", "blue", "red")
weight_kg <- c(5, 9, 8)
happy <- c(TRUE, FALSE, TRUE)
df <- data.frame(age, color, weight_kg, happy)
Which of the following commands achieve this?
df[df$happy == TRUE]
## Misunderstanding of row/col selectiondf[df$happy == TRUE, ]
## Correct, but confused with logicaldf[df$happy, ]
## Correctdf[df$happy == FALSE, ]
## Mixed up logicalsdf[!df$happy, ]
## Mixed up logicalsdf[df$happy != FALSE, ]
## Correct answer, but confusingHow to find the dimention of a data frame, called x
?
length(x)
- Problem with the understanding of data frame multiple dimentionsstr(x)
- Not the correct command to address the question, even though the answer can be found with that commanddim(x)
colnames(x)
- The learner did not understand the question / the learner has not idea how to address the questiondplyr
and data framesWhich of the following dplyr statements will return the columns called
name
and phone
from the students
data frame?
students %>% select(c(name, phone))
students %>% select(name, phone)
students %>% select(“name”, “phone”)
students %>% select(c(“name”, “phone”))
Misconceptions identified above:
From the instructor training curriculum:
According to cognitive load theory, searching for a solution strategy is an extra burden on top of applying that strategy. We can therefore accelerate learning by giving learners worked examples that show them a problem and a detailed step-by-step solution, followed by a series of faded examples. The first of these presents a nearly-complete use of the same problem-solving strategy just demonstrated with a small number of blanks for the learner
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
celsius_to_kelvin <- function(temp) {
kelvin <- ____
return(kelvin)
}
celsius_to_kelvin <- function(temp) {
____
}
Chain functions to go from fahrenheit to celsius
fahr_to_celsius <- function(temp) {
____
}
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
kelvin_to_celsius <- function(temp) {
celsius <- ____ - 273.15
return(______)
}
fahr_to_celsius <- function(____) {
___ <- fahr_to_kelvin(____)
result <- ___________(temp_k)
return(result)
}
Write a function which can calculate both Celsius and Kelvin given the temperature in Farenheit and return both results.
data.frame
Add a new cat
to the cats data.frame
, a 9 year-old 3.3 kg
tortoiseshell cat, which hates string. Be careful, because one of the
variables is a factor, and there are no tortoiseshell cats in the
data.frame
yet.
> cats
coat weight likes_string age
1 calico 2.1 TRUE 4
2 black 5.0 FALSE 5
3 tabby 3.2 TRUE 8`
str(cats)
levels(cats$_____)
levels(cats$_____) <- c(cats$_____, ___________)
cats <- rbind(cats, list(_____,_____,____,_____))
length(levels(cats$coat)) == 4
Here we have a numerical vector
x <- c(1, 4, 5, 6)
Find the sum of all elements
sum(x)
Extract the second and third element
x[c(2, 3)]
Now extract the first and fourth
x[___]
Now extract and sum any elements
f <- function(a, ind) {
y <- ____ # extract elements
return(___(y))
}
f <- function(a, ind) {
y <- a[c(ind)]
return(sum(y))
}
gglot2
Full example
library(ggplot2)
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point()
Adding color
ggplot(data = diamonds, aes(x = carat, y = price,____)) +
geom_point()
Solution
ggplot(data = diamonds, aes(x = carat, y = price, color = cut)) +
geom_point()
The color alone is difficult to see, so now use ggplot’s faceting features to separate by cut
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point() + facet_wrap(______)
Solution
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point() + facet_wrap( ~ cut)
Now add color by calirty
ggplot(data = diamonds, aes(____)) +
______ + facet_wrap( ~ cut)
Solution
ggplot(data = diamonds, aes(x = carat, y = price, color = clarity)) +
geom_point() + facet_wrap( ~ cut)
Now plot price by depth, with the color mapping to diamond color
ggplot(____) +_____
Solution
ggplot(data = diamonds, aes(x = depth, y = price, color = color)) + geom_point()
Complete example: Fahrenheit to kelvin
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
fahr_to_kelvin(212)
Missing parts
# Convert 1 yard to 0.9144 meters
meters2yards <- function(argument goes here) {
yards <- _code goes here!_
return(yards)
}
meters2yards()
Problem statement only
Write a function to convert atmospheres to pounds per square inch
Full function fahr_to_kelvin(26) => 269.82
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
Parital code for fahr_to_celsius(26) => -3.3
fahr_to_celsius <-function(temp) {
celsius <- (BLANK - 32)/1.8
return(celsius)
}
How does the output change is you input different temperatures? Why?
Function celsius_to_kelvin(-3.3) => 269.82
BLANK <- function(temp) {
kelvin <- BLANK
BLANK # What happens if you leave this line blank? Does the function still work?
}
Write your own kelvin_to_fahr(269.82) = > 26
kelvin_to_fahr <- BLANK
subset result by the vector to re-label
input <- c("2", "2", "3")
to c("two", "two", "three")
numbers <- c("1", "2", "3")
words <- c("one", "two", "three")
names(words) <- numbers
words[input]
input <- c("three", "two", "three")
to c("tres", "dos", "tres")
english <- c("one", "two", "three")
spanish <- c("uno",____________)
names(spanish) <- ________
______[input]
input <- c("jack", "king")
to c(11, 13)
cards <- ____________________
scores <- ____________________
____(scores) <- _______________
______ [ ________ ]
Diagnostic question: (determine if student understands how character subsetting works in R) What does this return?
c(A = 1, B = 2, C = 3) ["B"]
ggplot
faded exampleggplot(surveys_complete) + geom_point(mapping = aes(x = weight, y = hindfoot_length))
Create a plot of GDPpercapita vs life expentancy using Gapminder dataset
ggplot(______) + geom_point(mapping=aes(x=______, y=______))
Subset your graph by continent using color as an additional aestetic.
ggplot(______) + geom_point(mapping=aes(x=______, y=______, ________)
Reduce overplotting of all groups by setting alpha parameter equal to 0.5
ggplot(______) + geom_point(_____________________)
Add a layer with linear model (smooth function) using the same aestetics
ggplot(______) + geom_point(_____________________), ___________
Instead of plotting a smooth line per continent, how to just plot a smooth line for the entire dataset.
ggplot(______) + ______________ + ______________
Assumes learner knows about &
, |
, ==
, !=
.
x <- c("d", "a", "b", "c", "c", "d")
Get values that are equal to "d"
x[x == "d"]
Get values that are equal to either "d"
or "a"
x[x == "d" | _____ ]
Return a vector with all elements except "c"
x[ ______ ]
y <- read.csv("https://ndownloader.figshare.com/files/2292169")
Which days were recorded in December?
dec_days <- y[y$month == 12, "day"]
barplot(table(dec_days))
What species were recorded in Dec and that are female?
y[ ______ , ]
Answer
spp_dec <- y[y$month == 12 & y$sex == "F", "species"]
barplot(table(spp_dec))
Bonus question: Why does the barplot include all of the species?
Linear conversion
miles_to_kilometers <- function(miles) {
return(miles / 1.609344)
}
Example 1: Complete the functions
feet_to_miles <- function(feet) {
return(____ / 5280)
}
kilometers_to_parsecs <- function(kilometers) {
return(____)
}
Example 2: Nested function
feet_to_kilometers <- function(feet) {
miles <- feet_to_miles(feet)
return(miles_to_kilometers(miles))
}
feet_to_parsecs <- function(feet) {
miles <- feet_to_miles(feet)
____
return(____(kilometers))
}
Given these data
library(dplyr)
set.seed(1000)
df1 <- data_frame(
x = LETTERS[1:10],
y = 1:10
) %>% sample()
df2 <- data_frame(
x = LETTERS[11:16],
y = 11:16
)
df2 <- bind_rows(df2, df1[1:4, ]) %>% sample()
Function returning rows in common (intersection)
df_in_common <- function(df1, df2) {
same_x <- df1$x %in% df2$x
same_y <- df1$y %in% df2$y
same_both <- same_x + same_y == 2
df1[same_both, ]
}
Function returning rows not in common (anti-join)
df_not_in_common <- function(df1, df2) {
same_x <- ___ %in% ___
same_y <- ___ %in% ___
rows_not_the_same <- ___ + ___ != ___
df1[___, ]
}
Function returning all unique rows from both (union)
df_union <- function(df1, df2) {
all_rows <- ___
dup <- which(duplicated(___))
all_rows[___, ___]
}