Learning Quote of the Day

"Rereading has three strikes against it. It is time consuming. It doesn’t result in durable memory. And it often involves a kind of unwitting self-deception, as growing familiarity with the text comes to feel like mastery of the content. The hours immersed in rereading can seem like due diligence, but the amount of study time is no measure of mastery."

― Peter C. Brown, Make It Stick

Evaluation response

  • Extremely positive and, by and large, great feedback.
  • Will try my best to incorporate more "social stats-y" kinds of things in class
    • Have been trying to get you used to R while introducing it so far
  • Will post the plicker problems before class whenever possible so you can review them ahead of time
  • Lots of folks want more lecture. I think individual meetings are more beneficial to everyone, but I will try to find time to lecture and also provide you with time to practice.
  • I will try to spend more time reviewing concepts in class as well.

Further questions for you

  • What do you mean by "further clarification on what functions are used before doing practice problems"? "More upfront explanations on what we are about to do?"
  • One person said "The book can be confusing." but without any further information. What am I supposed to do with that?
  • If you are struggling and/or confused, why aren't you scheduling a time to meet with me every week? I've even met with students over the weekend. Students that have met with me have said they feel much better about the content and they have been able to quiz themselves more easily.

I simply will not accept the notion that you can't figure this out and I highly encourage you to be meeting with me if you are stuck. If you are scared of failing, I can't really help you unless we talk about what you aren't understanding in person.

Change to syllabus

Review PS6 questions

Variables negatively correlated with dep_delay

  • Temperature
  • Customer satisfaction

Why?

Why points to the left of (0, 0)?

How do we find the units of a variable?

?flights

How could we change the labels on the plot?

ggplot(alaska_flights, aes(x = dep_delay, y = arr_delay)) + 
  geom_point()

?labs

How could we change the labels on the plot?

ggplot(alaska_flights, aes(x = dep_delay, y = arr_delay)) + 
  geom_point() +
  labs(x = "Departure Delay (in minutes)", y = "Arrival Delay (in minutes)")

Match the plot with the appropriate variable set-up

R practice

Practice problem 1

  • Install the okcupiddata package by typing install.packages("okcupiddata") into your R console.

  • Load the package via library(okcupiddata)

  • Load the data via data(profiles)

  • Produce a barplot of the status variable

Practice problem 2

  • Produce a barplot of the sex variable in the profiles data frame

Practice problem 3

  • Produce a faceted barplot of the status variable based on sex

CHALLENGE

  • Fill the faceted barplot based on drinks

One more step

library(dplyr); library(okcupiddata); library(ggplot2)
non_straight <- filter(profiles, orientation != "straight", status != "unknown")
ggplot(data = non_straight, aes(x = drinks, fill = sex)) +
  geom_bar() + facet_wrap(orientation ~ status)

Tidy data review

Consider the following data of the price of three stocks (with names x, y, z) over 5 days. This data is not in tidy data format. How would you re-format it so that it is?

date x y z
2009-01-01 -0.189 -0.652 -2.470
2009-01-02 -1.763 1.748 -0.855
2009-01-03 0.577 -1.624 -3.458
2009-01-04 3.025 -2.874 -2.399
2009-01-05 -0.368 0.148 -1.601

Solution

We want

  • Each row to represent one value, in this case one stock price
  • Each column to represent one variable of information. In our case, we have three: date, price, and the name of the stock

"Tidy data" format is also known as long format, unlike the original data which was in wide format.

5NG and Grammar of Graphics Review

Run the following first in your console to create the data example:

# Load packages
library(dplyr)
library(ggplot2)

# Create data frame
simple_ex <- data_frame(
    A = c(1, 2, 3, 4),
    B = c(1, 2, 3, 4),
    C = c(3, 2, 1, 2),
    D = c("a", "a", "b", "b")
  )

View it

Let's view the data frame, which is in tidy format:

View(simple_ex)
A B C D
1 1 3 a
2 2 2 a
3 3 1 b
4 4 2 b

The Grammar of Graphics

  • A statistical graphic is a mapping of data variables to aes()thetic attributes of geom_etric objects.
  • A scatterplot has points as the geom_etric object
  • A linegraph has lines as the geom_etric object

1. Basic Scatterplot

  • the geom_etric objects are points
  • the aesthetic attributes are:
    • x-axis is variable A
    • y-axis is variable B
ggplot(data = simple_ex, mapping = aes(x = A, y = B)) + 
  geom_point()

1. Basic Scatterplot

ggplot(data = simple_ex, mapping = aes(x = A, y = B)) + 
  geom_point()

2. Scatterplot with Color

  • the geom_etric objects are points
  • the aesthetic attributes are:
    • x-axis is variable A
    • y-axis is variable B
    • color is variable D
ggplot(data = simple_ex, mapping = aes(x = A, y = B, color = D)) + 
  geom_point()

2. Scatterplot with Color

ggplot(data = simple_ex, mapping = aes(x = A, y = B, color = D)) + 
  geom_point()

3. Scatterplot with Sizes

  • the geom_etric objects are points
  • the aesthetic attributes are:
    • x-axis is variable A
    • y-axis is variable B
    • size is variable C
ggplot(data = simple_ex, mapping = aes(x = A, y = B, size = C)) + 
  geom_point()

3. Scatterplot with Sizes

ggplot(data = simple_ex, mapping = aes(x = A, y = B, size = C)) + 
  geom_point()

4. Line Graph

  • the geom_etric objects are lines
  • the aesthetic attributes are:
    • x-axis is variable A
    • y-axis is variable B
ggplot(data = simple_ex, mapping = aes(x = A, y = B)) + 
  geom_line()

4. Line Graph

ggplot(data = simple_ex, mapping = aes(x = A, y = B)) + 
  geom_line()

5. Line Graph with Color

  • the geom_etric objects are lines
  • the aesthetic attributes are:
    • x-axis is variable A
    • y-axis is variable B
    • color is variable D
ggplot(data = simple_ex, mapping = aes(x = A, y = B, color = D)) + 
  geom_line()

5. Line Graph with Color

ggplot(data = simple_ex, mapping = aes(x = A, y = B, color = D)) + 
  geom_line()

Work on Lab 3

To do for next time

  • Complete Lab 3 by 3 PM tomorrow
    • Email me with whose lab to grade with a link to their project
  • Read Sections 4.7 and 4.8 of MODERN DIVE textbook
  • Complete PS7 (Practice Quiz for Quiz #2) by 10 AM on Wednesday

Getting PS7

Run the following in the R console where EMAIL is your Pacific University email (mine is isma5720@pacificu.edu) and LastnameFirstname is the name of the project you created (mine is IsmayChester):

file.copy(from = "/shared/isma5720@pacificu.edu/pq2.Rmd",
              to = "/home/EMAIL/LastnameFirstname/")

Plan for next time

  • Go over many of the questions you submitted in your PS7 in class
  • Further review for Cumulative Quiz #2
  • Quiz #2 is Monday, October 3rd

Closing connections

On a half-sheet of paper,

  • Write your name on the front and answer
    • What are the differences between a histogram and a boxplot?
    • What is the difference between a histogram and a barplot?
  • On the back of this sheet, answer the following questions:

    1. What is the definition of "observational unit"?
    2. What are the three properties of a tidy data set?
    3. What are the Five Named Graphs we explored in this chapter?