Lab 2I
Directions: Follow along with the slides, completing
the questions in blue on your
computer, and answering the questions in red in your
journal.
Space, Click, Right Arrow or swipe left to move to
the next slide.
mean
of random
shuffles also produces differences that are normally distributed.R
functions to:
titanic
data
and calculate the mean
age
of people in the
data but shuffle
their survival
status 500
times.
Assign
this data the name
shfls
.shfls
, use
mutate
to add a new variable to the dataset. This new
variable should have the name diff
and should be the
mean
age
of those who survived minus those who
died.mean
and
sd
of the diff
variable.
Assign
these values the name
diff_mean
and diff_sd
.diff
variable looks approximately normally distributed.
Since the distribution of our diff
variable appears
normally distributed, we can use a normal model to estimate the
probability of seeing differences that are more extreme than our actual
data.
Draw a sketch of a normal curve. Label the mean age difference, based on your shuffles, and the actual age difference of survivors minus non-survivors from the actual data. Then shade in the area, under the normal curve, that is smaller than the actual difference.
Fill in the blanks to calculate the probability of an even smaller difference occurring than our actual difference using a normal model.
The probability you calculated in the previous slide is an estimate for how often we expect to see a difference smaller than the actual one we observed, by chance alone.
If you wanted to instead calculate the probability that the difference would be larger than the one observed, we could run (fill in the blanks):
rnorm
function.mean
height is 67 inches and the
standard deviation
is 3 inches. histogram
.pnorm
to calculate
probabilities based on a specified quantity.
Conduct one of the statistical investigations below:
titanic
data:
cdc
data:
Male
in our data is taller than the average
Female
?