Problem 1

What does the select feature do?

  • A. Choose variables/columns by their names
  • B. Make a new variable in the data frame
  • C. Pick rows based on conditions about their values
  • D. Choose variables/columns by their values

Problem 2

What is a parameter?

  • A. A calculation based on one or more variables measured in the sample. Parameters are usually denoted by lower case Arabic letters with other symbols added sometimes.
  • B. The largest group in which it makes sense to make inferences about from the sample collected
  • C. A calculation based on one or more variables measured in the population. Parameters are almost always denoted symbolically using Greek letters
  • D. A smaller collection of observational units that is selected from the population.

Problem 3

What is a statistic?

  • A. A calculation based on one or more variables measured in the sample. Parameters are usually denoted by lower case Arabic letters with other symbols added sometimes.
  • B. The largest group in which it makes sense to make inferences about from the sample collected
  • C. A calculation based on one or more variables measured in the population. Parameters are almost always denoted symbolically using Greek letters
  • D. A smaller collection of observational units that is selected from the population.

Problem 4

What is sampling?

    1. The probability of observing a sample statistic as extreme or more extreme than what was observed, assuming that the null hypothesis of a by chance operation is true
  • B. Refers to the process of selecting observations from a population. There are both random and non-random ways this can be done.
  • C. Refers to the largest group in which it makes sense to make inferences about from the sample collected.
  • D. The (usually) large pool of observational units that we are interested in

Problem 5

Bias corresponds to…

  • A. Ignoring one group over another
  • B. Resampling in order to achieve desired results
  • C. Neutrality towards the groups within the population
  • D. Favoring one group over another

Problem 6

What is a sample?

  • A. Is a smaller collection of variables that are selected from the population.
  • B. Is the characteristics of observational units selected are a good approximation of the characteristics from the original population.
  • C. Is a smaller collection of observational units that is selected from the population.
  • D. Is the (usually) large pool of observational units that we are interested in.

Problem 7

What is the definition of select?

  • A. Chooses rows based off their name.
  • B. Pick rows based on conditions about their values.
  • C. Choose variables by their conditions.
  • D. Choose variables/columns by their names.

Problem 8

What does the pipe function do?

  • A.Chains together dplyr functions and ggplot functions.
  • B.Combines only dplyr functions.
  • C.Replaces the + function in ggplot
  • D.Arranges the data to be tidy

Problem 9

What has two variable the explanatory that is continuous, the response is numerical, and there are multiple response values per explanatory value?

  • A. Faceted barplot
  • B. Scatterplot
  • C. Histogram
  • D. Boxplot

Problem 10

What is the appropriate code for determining the relationship between age and sex?

  • A. ggplot(data = profiles, mapping = aes(x = sex, fill = name)) + geom_bar()
  • B. ggplot(data = profiles, mapping = aes(x = name, fill = age)) + geom_bar()
  • C. ggplot(data = profiles, mapping = aes(x = age, y = sex)) + geom_boxplot()
  • D. ggplot(data = profiles, mapping = aes(x = sex, y = age)) + geom_boxplot()

Problem 11

What are the five main verbs?

  • A. group_by, filter, summarize, mutate, arrange
  • B.Select, fliter, summaraize, mutate, arrange
  • C. Select, filler, summaraize, mutate, arrange
  • D. Select, fliter, summaraize, count, arrange

Problem 12

What is the purpose of Bootstrapping?

  • A. There is no purpose.
  • B. Bootstrapping allows us to estimate the variety of our statistic from sample to sample
  • C. Bootstrapping allows us to estimate the variability of our statistic from sample to sample.
  • D. Bootstrapping allows us to estimate the variability of our statistic from sample to statistic.

Problem 13

Generalizability:

  • A. refers to the largest group in which it makes sense to make inferences about from the sample collected. This is directly related to how the sample was selected
  • B. refers to the smallest group in which it makes sense to make inferences about from the sample collected. This is directly related to how the sample was selected.
  • C. refers to the largest group in which it makes sense to make inferences about from the population collected. This is directly related to how the sample was selected.
  • D. refers to the largest group in which it makes sense to make inferences about from the sample collected. This is indirectly related to how the sample was selected.

Problem 14

What does the inner_join function do?

  • A. helps tidy up a messy data frame by combining all the observational units into columns and variables into rows.
  • B. joins two different observational units of different data frames together
  • C. joins two different rows of different data frames together
  • D. brings two different data frames together

Problem 15

What are the five main plots?

  • A. histograms, boxplots, barplots, scatter plots, line graphs
  • B. histograms, bareplots, barplots, scatter plots, line graphs
  • C. histograms, pie charts, barplots, scatter plots, line graphs
  • D. histograms, boxplots, barplots, scatter-line plots, line graphs

Problem 16

Generalizability

  • A. corresponds to a favoring of one group in a population over another group.
  • B. is a calculation based on one or more variables measured in the population.
  • C. is when the characteristics of observational units selected are a good approximation of the characteristics from the original population.
  • D. refers to the largest group in which it makes sense to make inferences about from the sample collected, and is directly related to how the sample was selected.

Problem 17

The sample_n function

  • A. creates one resample/bootstrap sample.
  • B. gives how many entries are in a filtered data frame.
  • C. filters the data at random and is used to get a random sample.
  • D. none of the above.

Problem 18

The select function allows you to

  • A. pick rows based on conditions about their values.
  • B. sort rows based on one or more variables.
  • C. choose variables/columns by their names.
  • D. none of the above.

Problem 19

We can use the inner_join function

  • A. to chain together dplyr functions.
  • B. only in random sampling.
  • C. to bring two different data frames together.
  • D. both A and B.

Problem 20

Tasting soup is an analogy to

  • A. inference.
  • B. generalizability.
  • C. random sampling.
  • D. population.

Problem 21

Define a representative sample.

  • A. A sample from the U.S House of Representatives.
  • B. A sample that is meant to be manipulated to show certain patterns.
  • C. A sample that shows a good approximation of the original population.
  • D. A sample that is never chosen at random.

Problem 22

What are possible biases that could occur when performing a phone survey?

  • A. Non-response bias
  • B. Undercoverage
  • C. Voluntary response bias
  • D. All of the above

Problem 23

Given a one categorical explanatory variable and one numerical response variable, what plot should we use?

  • A. Faceted line graph
  • B. Barplot
  • C. Boxplot
  • D. Histogram

Problem 24

What is the function of select()?

  • A. Chooses rows based on conditions about their values
  • B. Chooses variables/columns by their names
  • C. Chooses certain measurements to list
  • D. Chooses data frames to combine

Problem 25

At its most basic, resample()

  • A. resamples the input vector without replacement
  • B. resamples the input vector to show a representative sample
  • C. resamples the input vector to find missing data
  • D. resamples the input vector with replacement

Problem 26

Which of the following is true of statistics and parameters?

  • A. Parameters are calculations based on sample(s), statistics are calculations based on the population
  • B. Statistics are calculations based on sample(s), parameters are calculations based on populations
  • C. Parameters are almost always known.
  • D. Both B and C

Problem 27

Using the Titanic data which of these pieces of code would find the number of women who died from each class? (Remember that Titanic <- as.data.frame(Titanic) is needed first.)

  • A.
Titanic %>% filter(Sex = "Female", Survived = "No")  %>%
  group_by(Class) %>% 
  summarise(sum_deadwomen = sum(Freq))
  • B.
Titanic %>% filter(Sex == Female, Survived == No)  %>%
  group_by(Class)  %>%
  summarise(sum_deadwomen = sum(Class))
  • C.
Titanic %>% filter(Sex == "Female", Survived == "No")  %>%
  group_by(Class)  %>% 
  summarise(sum_deadwomen = sum(Class))
  • D.
Titanic %>% filter(Sex == "Female", Survived == "No")  %>%
  group_by(Class)  %>% 
  summarise(sum_deadwomen = sum(Freq))

Problem 28

Using the movies data set in the ggplot2movies package, how could you plot the average yearly rating of movies?

  • A.
movies %>% group_by(rating) %>%
   summarize(mean_yearrating = mean(rating, na.rm = TRUE)) %>%
   ggplot(aes(x = year, y = mean_yearrating)) + geom_line()
  • B.
movies %>% select(year)  %>% 
  summarize(mean_yearrating = mean(rating, na.rm = TRUE)) %>%
  ggplot(aes(x = year, y = rating)) + geom_line()
  • C.
movies  %>% group_by(year)  %>% 
  summarize(mean_yearrating = mean(rating, na.rm = TRUE)) %>%
  ggplot(aes(x = year, y = mean_yearrating)) + geom_line()
  • D.
movies  %>% group_by(rating)  %>% 
  summarize(mean_yearrating = mean(rating, na.rm = TRUE)) %>%
  ggplot(aes(x = year, y = rating)) + geom_line()

Problem 29

Which could NEVER be a possible outcome of running this code resample(x = fruit, replace = FALSE)? Recall that fruit <- c("mango", "apple", "orange").

  • A. mango, mango, orange
  • B. orange, mango, apple
  • C. apple, orange, mango
  • D. mango, orange, apple

Problem 30

What symbol do you use to chain together multiple functions in themosaic package?

  • A. + the plus sign
  • B. %>% the pipe
  • C. * an asterisk
  • D. All of the above

Problem 31

Which package is most necessary if needed to create/perform multiple simulations?

  • A. ggplot package
  • B. mosaic package
  • C. dplyr package
    1. You are able to create/perform simulations on all the listed packages.

Problem 32

What are the 4 central functions of the mosaic package?

  • A. factor(), arrange(), select(), resample()
  • B. rflip(), shuffle(), do(), resample()
  • C. dim(), rename(), filter(), ggplot()
  • D. None of the above are central functions of the mosaic package.

Problem 33

Which of the following would be the result of the output?

library(okcupiddata)

  • A. By performing this output, R will check if okcupiddata *has been downloaded and installed.
  • B. By performing this output, R will allow you to view the okcupiddata dataset.
  • C. By performing this output, it will allow you to remove the missing values in the following dataset: okcupiddata.
  • D. All of the above.

Problem 34

If you would like to list many entries in a vector object, you can do so by?

  • A. entering ?c in the R console.
  • B. using the View function.
  • C. both A and B.
  • D. none of the above.

Problem 35

In order to complete the task of creating a simulation where you flip a coin in sets of 10 and record how many heads occurred in each set for 500 times, what would be the necessary functions to input?

  • A. simulations <- resample(10) * rflip(500)
  • B. simulations <- do(500) * rflip(10)
  • C. simulations <- rflip(500) * do(10)
  • D. simulations <- shuffle(10) * rflip(500)

Problem 36

The term “Faceting” is used when…

  • A. You want combine two different plots with different categorical variables.
  • B. You want to create small multiples of the same plot over a different categorical variable.
  • C. You want two different histograms side-by-side.
  • D. You want to create small multiples of the same plot over different numerical variables.

Problem 37

If you had the annual data of manufactured cars, what categorical variable would you facet wrap in order to see the amount of cars manufactured by month?

  • A. facet(year)
  • B. facet(month)
  • C. facet(every month)
  • D. facet(~ month)

Problem 38

What symbol is used when chaining dplyr functions together?

  • A. +
  • B. =
  • C. %>%
  • D. \(()\)

Problem 39

A collection of observation units collected from the population is the:

  • A. Population
  • B. Selection
  • C. Sample
  • D. Representative sample

Problem 40

Bootstrapping is:

  • A. The process of making a bootstrap distribution; without replacement.
  • B. How you make representative samples.
  • C. To sample with replacement to create new resamples of the same size.
  • D. To make different sized samples to compare.

Problem 41

What does the str function do and why would you use it?

  • A. It stands for structure and it gives a few of the first entries of each variable in a row after the variable.
  • B. It stands for structure and it makes your data set tidy if it is not.
  • C. It stands for structure and it puts all your data in to a table.
  • D. It does nothing so its useless.

Problem 42

Why are there bins in a histogram, but not a barplot?

  • A. In a histogram the bins are if the variable falls in between two numeric values.
  • B. A bar graph is for when the variable is categorical.
  • C. A and B from above.
  • D. None of the above.

Problem 43

If a problem tells to to sort a specific variable in decreasing order you should use the verb…

  • A. Arrange and desc.
  • B. Summarize and desc.
  • C. Filter and desc.
  • D. Mutate and desc.

Problem 44

In bootstrapping, the resample must be the same length as the original.

  • A. True always
  • B. False always
  • C. This is rarely true
  • D. This is rarely false

Problem 45

Selecting observations from a population, putting them back and repicking observations is an example of…

  • A. Non-random sampling
  • B. Random sampling.
  • C. Bias.
  • D. Two of the above.

Problem 46

What does “bootstrapping” do?

  • A. Samples the dataset and replaces the data each time
  • B. Samples a dataset without replacing the data each time
  • C. Estimates the variability of a statistic from a sample
  • D. It does not help you achieve the American Dream.

Problem 47

Which is true?

  • A. Parameters are always known.
  • B. The population is always larger than the sample
  • C. A statistic and a parameter come from different calculations
  • D. Sets of coin flips will always turn out to be 50/50

Problem 48

Which is true?

  • A. filter selects variables in columns
  • B. select selects variables in rows
  • C. mutate is used with mean and sd
  • D. mutate is used to create a new variable

Problem 49

Which is true?

  • A. Sampling is better when it is not random.
  • B. Sampling is better when it is random.
  • C. A representative sample does not need to have all the parts of the populations in order to represent it.
  • D. Bias is a problem of the past, and is no longer an issue for statisticians

Problem 50

What is not true about tidy data?

  • A. Each variable forms a row
  • B. Each observational unit forms a row
  • C. Each type of observation forms a table
  • D. Each variable forms a column

Problem 51

What is a component of Tidy Data?

  • A. 1-2 variables per column
  • B. 1 variable per row
  • C. values make up the table
  • D. observational units consists in columns

Problem 52

To create a side-by-side barplot what word will you need to use?

  • A. “Edge”
  • B. “Side”
  • C. “Dodge”
  • D. “Next”

Problem 53

Which of these would produce a side by side barplot of status faceted by sex with fill color determined by orientation for the profiles data frame in the okcupiddata package?

  • A.

    ggplot(data = profiles, aes(status, fill = orientation)) +
      geom_bar() + 
      facet_wrap(~sex) + 
      (position = "dodge")
    ggplot(data = profiles, aes(status, fill = orientation)) + 
      geom_bar(position = "dodge") + 
      facet_wrap(~sex) 
  • C.

    ggplot(data = profiles, aes(status, fill = orientation)) + 
      geom_bar() + 
      facet_wrap(~sex)
    ggplot(data = profiles, aes(status, fill = orientation)) +
      geom_bar(position = dodge) + 
      facet_wrap(~sex)

Problem 54

The pipe function chains commands together in the dplyr package, what does it stand for?

  • A. And then
  • B. Also
  • C. Adding
  • D. Plus

Problem 55

What does the concept of Bootstrapping do?

  • A. Calculates the mean from the original sample.
  • B. Estimates the distribution and standard deviation of a representative sample.
  • C. Calculates the distribution of the original sample.
  • D. Approximates the distribution and standard deviation from a original sample.

Problem 56

Which of these is NOT one of the first things you should do when given a data set?

  • A. Identify the observational unit
  • B. Give the types of variables you are presented with
  • C. Specify the variables
  • D. reorganize the data set to make it tidy

Problem 57

Which of the following is TRUE about pie charts?

  • A. We should always use pie charts, for every graph, they are a great representation of every data set.
  • B. Pie charts are only good for replacing histograms
  • C. They are not that great for looking at the different distributions in some data sets.
  • D. Pie charts do not actually exist

Problem 58

gapminder %>% filter(subRegion == “Southern Asia” | region == “Americas”) %>% Referring to the R code above, what does this symbol “|” refer to?

  • A. and
  • B. if
  • C. or
  • D. but

Problem 59

When thinking about sampling, what should we always be comparing it to?

  • A. Tacos
  • B. Salads
  • C. Sandwiches
  • D. Soup

Problem 60

Which of the following definitions is correct?

  • A. Population- Usually a large pool of observational units that we are interested in (LARGER than the sample).
  • B. Bias- refers to the process of selecting observations from population. There are both random and non-random ways this can be done
  • C. Statistic- If the characteristics from the original units selected are good approximation of the characteristics from the population.
  • D. P-value- Corresponds to a favoring of one group in a population of another group.

Problem 61

Which one is not one of the 5 main verbs

  • A. Select
  • B. Filter
  • C. Arrange
  • D. Change

Problem 62

What does this %>% do?

  • A.chain together dplyr function
  • B.arranges graphs into an a certain order
  • C.Both A and D
  • D.it is a short cut for the five main verbs

Problem 63

When graphing a categorical response and a categorical explanatory variables what graph should we use?

  • A.Bar plot
  • B.Box Plots
  • C. scatter plot (preferred)
  • D.pie chart

Problem 64

What is Bootstrapping?

  • A. estimate the variability of our statistic from sample to sample
  • B. displays the range of the data
  • C. adds straps to your boots
  • D. transfers a dataset online to be viewed in R

Problem 65

What is not a key distinction of types of sampling

  • A. Sampling without replacement
  • B. Sampling with replacement
  • C. These are all types of sampling
  • D. Sampling with graphs