# Problem Statement

Students at Virginia Tech studied which vehicles come to a complete stop at an intersection with four-way stop signs, selecting at random the cars to observe. They looked at several factors to see which (if any) were associated with coming to a complete stop. (They defined a complete stop as “the speed of the vehicle will become zero at least for an [instant]”). Some of these variables included the age of the driver, how many passengers were in the vehicle, and type of vehicle. The variable we are going to investigate is the arrival position of vehicles approaching an intersection all traveling in the same direction. They classified this arrival pattern into three groups: whether the vehicle arrives alone, is the lead in a group of vehicles, or is a follower in a group of vehicles. The students studied one specific intersection in Northern Virginia at a variety of different times. Because random assignment was not used, this is an observational study. Also note that no vehicle from one group is paired with a vehicle from another group. In other words, there is independence between the different groups of vehicles. (Tweaked a bit from Tintle et al. 2014 [p. 8-2 - 8-13])

# Competing Hypotheses

## In words

• Null hypothesis: There is no association between the arrival position of the vehicle and whether or not it comes to a complete stop.

• Alternative hypothesis: There is an association between the arrival position of the vehicle and whether or not it comes to a complete stop.

## Another way in words

• Null hypothesis: The long-run probability that a single vehicle will stop is the same as the long-run probability a lead vehicle will stop, which is the same as the long-run probability that a following vehicle will stop. In other words all three long-run probabilities are actually the same.

• Alternative hypothesis: At least one of these parameter (long-run) probabilities is different from the others

## In symbols (with annotations)

• $$H_0: p_{single} = p_{lead} = p_{follow}$$, where $$p$$ represents the long-run probability a vehicle will stop.
• $$H_A$$: At least one of these parameter probabilities is different from the others

## Set $$\alpha$$

It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

# Exploring the sample data

Observed Counts and (Conditional Probabilities)
Single Vehicle Lead Vehicle Following Vehicle TOTAL
Complete Stop 151 (0.858) 38 (0.905) 76 (0.776) 265
Not Complete Stop 25 (0.142) 4 (0.095) 22 (0.224) 51
Total 176 42 98 316
stop <- c(rep("complete", 265), rep("not_complete", 52))
vehicle_type <- c(rep("single", 151), rep("lead", 38), rep("follow", 76),
rep("single", 25), rep("lead", 5), rep("follow", 22))
df <- data.frame(stop, vehicle_type)
ggplot(data = df, mapping = aes(x = vehicle_type, fill = stop)) +
geom_bar(position = "fill", color = "black") +
xlab("\nArrival Position of Vehicle") +
ylab("Conditional Probability\n")