common_name | scientific_name | predator | wingspan | forest | grassland | wetland |
---|---|---|---|---|---|---|
Killdeer | Charadrius vociferus | FALSE | 46 | FALSE | TRUE | TRUE |
Bobolink | Dolichonyx oryzivorus | FALSE | 30 | FALSE | TRUE | FALSE |
Northern Bobwhite | Colinus virginianus | FALSE | 33 | FALSE | TRUE | FALSE |
Little Penguin | Eudyptula minor | TRUE | FALSE | FALSE | TRUE | |
Wilson's Snipe | Gallinago delicata | FALSE | 41 | FALSE | FALSE | TRUE |
Red-Legged Partridge | Alectoris rufa | FALSE | 48 | FALSE | TRUE | FALSE |
Tree Swallow | Tachycineta bicolor | FALSE | 38 | FALSE | FALSE | TRUE |
Eurasian Sparrowhawk | Accipiter nisus | TRUE | 65 | TRUE | FALSE | FALSE |
Western Tanager | Piranga ludoviciana | FALSE | 30 | TRUE | FALSE | FALSE |
New Holland Honeyeater | Phylidonyris novaehollandiae | FALSE | 20 | FALSE | TRUE | FALSE |
Maned Duck | Chenonetta jubata | FALSE | 79 | FALSE | FALSE | TRUE |
Peregrine Falcon | Falco peregrinus | TRUE | 104 | FALSE | TRUE | TRUE |
2.2 Exercises
For these exercises, we will consider a dataset contining information about different species of birds, from the board game Wingspan.
Twelve rows of the dataset are shown below, but the full dataset has 357 birds in it!
Exercise 2.2.1
Compute the mean of the variable
wingspan
from the sample of 12 birds.Compute the standard deviation of the variable
wingspan
from the sample of 12 birds.
This exercise is the only time I will ever ask you to compute standard deviation of the sample mean “by hand”. The goal right now is to understand what it is measuring. In real life, and in the rest of the class, we will use computers to do the math for us!
Compute the standard deviation of the sample mean of the variable
wingspan
from the sample of 12 birds.To do the computations above, you had to decide how to handle the missing data for the bird
Little Penguin
. Explain how you approached this, and justify your decision.It’s generally reasonable to think that the sample mean won’t fall more than 2 standard deviations from the (unknown) true mean. What are the lower and upper parts of this range for the variable
wingspan
?
Hint: We are asking you to find what number is 2 sd’s below the mean you computed in (1), and the number that is 2 sd’s above it.
- The mean of the
wingspan
variable from all 357 birds is 64.17. If we take this to be the true mean, how unusual was our random sample of 12? Justify your answer with a standardized score.
Exercise 2.2.2
The variables forest
, grassland
, and wetland
refer to the habitat(s) that a bird can live in (in the context of the board game, not necessarily real life).
These variables are summarized below for our sample of 12 birds:
forest n percent
FALSE 10 0.8333333
TRUE 2 0.1666667
wetland n percent
FALSE 6 0.5
TRUE 6 0.5
grassland n percent
FALSE 6 0.5
TRUE 6 0.5
Are these variables dummy variables? Why or why not?
For each habitat, give the sample proportion for how many birds can live in that region.
For each habitat, give the standard deviation of the sample proportion.
What is a reasonable range for the true proportion of all birds that can live in the wetland region?
What is a reasonable range for the true proportion of all birds that can live in the forest region?
We find that in the full dataset, 50% of all birds can live in the forest region. The value \pi = 0.5 fell outside of the interval you computed in (5). Why do you think this is the case?
Exercises 2.2.3
Give the five-number summary for the variable
wingspan
.Are there any outliers in the sample of 12 birds?
Do you believe this outlier should be removed from the dataset? Why or why not?