Tips

  • Be sure to review
    • problem set solutions
    • your labs (especially my comments and your corrections)
    • slides from class
    • the textbooks
  • Write some sample problems yourself and try to solve them. You can easily do this by modifying the labs and the problem sets as needed. There is a strong positive correlation between students that show me the quizzing they did and good grades on exams.
  • If you perform poorly on the exam and we meet to talk about it, I will ask you to show me exactly how you studied for the exam. If you tell me, “I read over the notes many times.” I’m going to tell you that I told you that wasn’t an effective strategy. I want you all to succeed and get better and the best way to do that is for you to take ownership on what you know and to quiz yourselves often.
  • Remember that you shouldn’t just be reading over the answers and the book over and over again. Once is fine, but after that you should be quizzing yourself for understanding. If you don’t know what you know when you are studying, you will be surprised when you take the exam. Don’t do that!
    To perform well on Take Home Exam 1, you should be able to perform the following. I won’t explicitly ask you questions about Chapters 1 - 4 of Getting Used to R, RStudio, and R Markdown, but you should review the content there to help you prepare. Many of the ideas there will help you as you work in the RStudio Server on the Take Home Exam.

Chapter 5 of Getting Used to R, RStudio, and R Markdown

  • Know how to read from a CSV file saved in my Public directory and store it as an object in R

  • Use the str function and explain its output for a given data frame

  • Know how to create a vector in three different ways
    • Using the c function
    • Using the seq function
    • Using the : operator
  • Explain what the factor function does in R and why it is frequently used to improve the layout of plots

  • Extract elements from a vector/variable using th [ ] operators
    • What does the - operator do inside of the [ ] operators
  • Use the ? function to obtain help on a function or data set in a package in R


Chapter 3 (Tidy Data)

  • Give the definition of tidy data?

  • Identify whether a data set follows the tidy guidelines and explain why or why not using the definition.

  • Explain what the following functions do in R.
    • install.packages
    • library
    • data
    • View
  • Identify the observation unit, variables, and variable types (continuous, discrete, categorical) from a problem statement / data set

  • Explain how normal forms of data can be used to assist with data analysis


Chapter 4 (Visualizing Data)

  • Be able to identify which plot(s) are most appropriate from a problem statement / variable types in a data set.

  • Explain the differences between the Five Named Graphs

Section 4.2

  • Explain the components of a histogram
    • What makes up the vertical axis?
    • What makes up the horizontal axis?
    • Do the bars touch?
    • How does changing the binwidth / number of bins change a histogram?
  • Create a histogram in R using ggplot2 from an appropriate variable in a data set

  • Interpret what the histogram tells you about the distribution of the variable
    • Is it symmetric or skewed?
    • Where do most values fall?

Section 4.3

  • Explain why a faceted histogram is useful in comparing the distribution of a numeric variable across the groups of a categorical variable.

  • Compare and contrast the usefulness and differences between faceted histograms and boxplots.

  • Describe the components of a boxplot

  • Use a boxplot to compare two distributions

  • Explain the components of a boxplot
    • What makes up the vertical axis?
    • What makes up the horizontal axis?
  • Create a histogram in R using ggplot2 from an appropriate variable in a data set

Section 4.4

  • Explain the components of a barplot
    • What makes up the vertical axis?
    • What makes up the horizontal axis?
  • Describe why pie charts should be replaced with barplots from a visual perception perspective

  • Explain the differences between stacked, side-by-side, and faceted barplots

  • Create the following in R using ggplot2 from appropriate variable(s) in a data set
    • Barplot
    • Stacked barplot
    • Side-by-side barplot
    • Faceted barplot

Sections 4.5 and 4.6

  • Explain how jitter and/or alpha can be applied to a plot to help the reader understand the relationship between two variables

  • Create the following in R using ggplot2 from appropriate variable(s) in a data set
    • Scatter-plot
    • Line-graph
  • Give a scenario where a line-graph is more appropriate than a scatter-plot

Section 4.7

  • Clarify what makes up a statistical graphic

  • Describe what aes, geom_, facet, and position correspond to in a ggplot function call


Extensions

  • Explain how you could make a messy data set into a tidy data set by hand.
    • What would the variables be and what would a couple of rows in the new tidy data set look like?
  • Clarify why R code produces an error and fix the code to produce the correct result.

  • Describe how one could make a faceted barplot based on three variables
    • What are the types of the variables?
    • Give an example of one such scenario where this would make sense.
  • Describe how one could make a faceted scatterplot based on three variables
    • What are the types of the variables?
    • Give an example of one such scenario where this would make sense.
  • Describe how you could use color to look at the relationship between three continuous variables.

  • Describe how one can use color, fill, size, etc. to show other multivariate (more than two variables) relationships