3.1 Exercises

The following questions refer to the dataset from Chapter 3.1, in which a waiter collected information about bills and tips at his tables for a week. A few rows of the dataset are shown below:

total_bill

tip

percent_tip

smoker

day

time

size

17.59

2.64

0.1500853

No

Sat

Dinner

3

25.21

4.29

0.1701706

Yes

Sat

Dinner

2

31.27

5.00

0.1598977

No

Sat

Dinner

3

20.08

3.15

0.1568725

No

Sat

Dinner

3

16.00

2.00

0.1250000

Yes

Thur

Lunch

2

Some summary statistics for this dataset are given below:

smoker

total_bill_mean

total_bill_sd

total_bill_n

No

19.01

8.34

31

Yes

20.29

9.87

26

day

percent_tip_mean

percent_tip_sd

percent_tip_n

Fri

0.15

0.03

6

Sat

0.17

0.06

23

Sun

0.20

0.09

13

Thur

0.17

0.04

15

time

smoker

n

Dinner

No

22

Dinner

Yes

17

Lunch

No

9

Lunch

Yes

9

Correlations:

variable

total_bill

tip

percent_tip

size

total_bill

1.0000000

0.5706443

-0.4267555

0.6181611

tip

0.5706443

1.0000000

0.4040693

0.4565800

percent_tip

-0.4267555

0.4040693

1.0000000

-0.2095724

size

0.6181611

0.4565800

-0.2095724

1.0000000

Exercises 3.1.1

  1. Write the following null hypotheses in symbols:
  1. The true mean tip amount is the same on Saturdays and Sundays.

  2. The probability of a table having smokers the same at lunch and at dinner.

  3. The true mean percent tip amount is the same on Saturdays than on Sundays. (Careful! Although this has “percent” in the name, we are measuring a quantitative variable, not a categorical one. The variable percent_tip contains numbers; those numbers just happen to be percents.)

  4. People tend to tip the same percentage no matter how expensive their total bill is. (Hint: What variables are involved in this question, what types are they, and how do we measure their relationship?)

  1. Translate the following research questions into a null hypothesis:
  1. Do smoker tables have different spending habits than non-smokers?

  2. Do dining parties with more people tend to tip higher percentages?

  3. Are there more smoker or non-smoker groups?

Exercises 3.1.2

  1. Consider the research question,

Is the mean tip percent higher on Saturday than on Sunday?

The following histogram shows the results of simulating data 1000 times from the null distribution.

  1. Of all these simulated statistics, what appears to be the center value? Why does this make sense?

  2. These simulated statistics show some random variability. What do you think is the (approximate) standard deviation of the simulated differences of sample means? Why?

  3. In our real data, we observed a difference of sample means of

\bar{x}_{Sat} - \bar{x}_{Sun} = 0.03

What is the approximate p-value of our study? (You should “guesstimate” this from the plot, not count dots for an exact answer!)

  1. What do you conclude? (Give a short one-sentence answer to the research question; no need to explain your answer.)
  1. Consider the research question,

Is a table less likely to have smokers at dinner than at lunch?

The following histogram shows the results of simulating data 1000 times from the null distribution.

  1. Of all these simulated statistics, what appears to be the center value? Why does this make sense?

  2. These simulated statistics show some random variability. What do you think is the (approximate) standard deviation of the simulated differences of sample proportions? Why?

  3. In our real data, we observed a difference of proportions of

\hat{p}_{D} - \hat{p}_L = -0.064

What is the approximate p-value of our study?

  1. What do you conclude?
  1. Consider the research question,

Do people tend to give lower percent tips when their total bills are higher?

The following histogram shows the results of simulating data 1000 times from the null distribution.

  1. Of all these simulated statistics, what appears to be the center value? Why does this make sense?

  2. These simulated statistics show some random variability. What do you think is the (approximate) standard deviation of the simulated differences of sample proportions? Why?

  3. In our real data, we observed a sample correlation between total_bill and percent_tip of -0.42.

What is the approximate p-value of our study?

  1. What do you conclude?