Lab 2I
Directions: Follow along with the slides, completing
the questions in blue on your
computer, and answering the questions in red in your
journal.
Space, Click, Right Arrow or swipe left to move to
the next slide.
mean of random
shuffles also produces differences that are normally distributed.R
functions to:
titanic data
and calculate the mean age of people in the
data but shuffle their survival status 500
times.
Assign this data the name
shfls.shfls, use
mutate to add a new variable to the dataset. This new
variable should have the name diff and should be the
mean age of those who survived minus those who
died.mean and
sd of the diff variable.
Assign these values the name
diff_mean and diff_sd.diff
variable looks approximately normally distributed.
Since the distribution of our diff variable appears
normally distributed, we can use a normal model to estimate the
probability of seeing differences that are more extreme than our actual
data.
Draw a sketch of a normal curve. Label the mean age difference, based on your shuffles, and the actual age difference of survivors minus non-survivors from the actual data. Then shade in the area, under the normal curve, that is smaller than the actual difference.
Fill in the blanks to calculate the probability of an even smaller difference occurring than our actual difference using a normal model.
The probability you calculated in the previous slide is an estimate for how often we expect to see a difference smaller than the actual one we observed, by chance alone.
If you wanted to instead calculate the probability that the difference would be larger than the one observed, we could run (fill in the blanks):
rnorm function.mean height is 67 inches and the
standard deviation is 3 inches. histogram.pnorm to calculate
probabilities based on a specified quantity.
Conduct one of the statistical investigations below:
titanic data:
cdc data:
Male in our data is taller than the average
Female?