2.2 Exercises

For these exercises, we will consider a dataset contining information about different species of birds, from the board game Wingspan.
Twelve rows of the dataset are shown below, but the full dataset has 357 birds in it!

common_name

scientific_name

predator

wingspan

forest

grassland

wetland

Killdeer

Charadrius vociferus

FALSE

46

FALSE

TRUE

TRUE

Bobolink

Dolichonyx oryzivorus

FALSE

30

FALSE

TRUE

FALSE

Northern Bobwhite

Colinus virginianus

FALSE

33

FALSE

TRUE

FALSE

Little Penguin

Eudyptula minor

TRUE

FALSE

FALSE

TRUE

Wilson's Snipe

Gallinago delicata

FALSE

41

FALSE

FALSE

TRUE

Red-Legged Partridge

Alectoris rufa

FALSE

48

FALSE

TRUE

FALSE

Tree Swallow

Tachycineta bicolor

FALSE

38

FALSE

FALSE

TRUE

Eurasian Sparrowhawk

Accipiter nisus

TRUE

65

TRUE

FALSE

FALSE

Western Tanager

Piranga ludoviciana

FALSE

30

TRUE

FALSE

FALSE

New Holland Honeyeater

Phylidonyris novaehollandiae

FALSE

20

FALSE

TRUE

FALSE

Maned Duck

Chenonetta jubata

FALSE

79

FALSE

FALSE

TRUE

Peregrine Falcon

Falco peregrinus

TRUE

104

FALSE

TRUE

TRUE

Exercise 2.2.1

  1. Compute the mean of the variable wingspan from the sample of 12 birds.

  2. Compute the standard deviation of the variable wingspan from the sample of 12 birds.

This exercise is the only time I will ever ask you to compute standard deviation of the sample mean “by hand”. The goal right now is to understand what it is measuring. In real life, and in the rest of the class, we will use computers to do the math for us!

  1. Compute the standard deviation of the sample mean of the variable wingspan from the sample of 12 birds.

  2. To do the computations above, you had to decide how to handle the missing data for the bird Little Penguin. Explain how you approached this, and justify your decision.

  3. It’s generally reasonable to think that the sample mean won’t fall more than 2 standard deviations from the (unknown) true mean. What are the lower and upper parts of this range for the variable wingspan?

Hint: We are asking you to find what number is 2 sd’s below the mean you computed in (1), and the number that is 2 sd’s above it.

  1. The mean of the wingspan variable from all 357 birds is 64.17. If we take this to be the true mean, how unusual was our random sample of 12? Justify your answer with a standardized score.

Exercise 2.2.2

The variables forest, grassland, and wetland refer to the habitat(s) that a bird can live in (in the context of the board game, not necessarily real life).
These variables are summarized below for our sample of 12 birds:

 forest  n   percent
  FALSE 10 0.8333333
   TRUE  2 0.1666667
 wetland n percent
   FALSE 6     0.5
    TRUE 6     0.5
 grassland n percent
     FALSE 6     0.5
      TRUE 6     0.5
  1. Are these variables dummy variables? Why or why not?

  2. For each habitat, give the sample proportion for how many birds can live in that region.

  3. For each habitat, give the standard deviation of the sample proportion.

  4. What is a reasonable range for the true proportion of all birds that can live in the wetland region?

  5. What is a reasonable range for the true proportion of all birds that can live in the forest region?

  6. We find that in the full dataset, 50% of all birds can live in the forest region. The value \pi = 0.5 fell outside of the interval you computed in (5). Why do you think this is the case?

Exercises 2.2.3

  1. Give the five-number summary for the variable wingspan.

  2. Are there any outliers in the sample of 12 birds?

  3. Do you believe this outlier should be removed from the dataset? Why or why not?