3.1 Exercises

The following questions refer to the dataset from Chapter 3.1, in which a waiter collected information about bills and tips at his tables for a week. A few rows of the dataset are shown below:

total_bill	tip	percent_tip	smoker	day	time	size
17.59	2.64	0.1500853	No	Sat	Dinner	3
25.21	4.29	0.1701706	Yes	Sat	Dinner	2
31.27	5.00	0.1598977	No	Sat	Dinner	3
20.08	3.15	0.1568725	No	Sat	Dinner	3
16.00	2.00	0.1250000	Yes	Thur	Lunch	2

Some summary statistics for this dataset are given below:

smoker	total_bill_mean	total_bill_sd	total_bill_n
No	19.01	8.34	31
Yes	20.29	9.87	26

day	percent_tip_mean	percent_tip_sd	percent_tip_n
Fri	0.15	0.03	6
Sat	0.17	0.06	23
Sun	0.20	0.09	13
Thur	0.17	0.04	15

time	smoker	n
Dinner	No	22
Dinner	Yes	17
Lunch	No	9
Lunch	Yes	9

Correlations:

variable	total_bill	tip	percent_tip	size
total_bill	1.0000000	0.5706443	-0.4267555	0.6181611
tip	0.5706443	1.0000000	0.4040693	0.4565800
percent_tip	-0.4267555	0.4040693	1.0000000	-0.2095724
size	0.6181611	0.4565800	-0.2095724	1.0000000

Exercises 3.1.1

Write the following null hypotheses in symbols:

The true mean tip amount is the same on Saturdays and Sundays.
The probability of a table having smokers the same at lunch and at dinner.
The true mean percent tip amount is the same on Saturdays than on Sundays. (Careful! Although this has “percent” in the name, we are measuring a quantitative variable, not a categorical one. The variable percent_tip contains numbers; those numbers just happen to be percents.)
People tend to tip the same percentage no matter how expensive their total bill is. (Hint: What variables are involved in this question, what types are they, and how do we measure their relationship?)

Translate the following research questions into a null hypothesis:

Do smoker tables have different spending habits than non-smokers?
Do dining parties with more people tend to tip higher percentages?
Are there more smoker or non-smoker groups?

Exercises 3.1.2

Consider the research question,

Is the mean tip percent higher on Saturday than on Sunday?

The following histogram shows the results of simulating data 1000 times from the null distribution.

Of all these simulated statistics, what appears to be the center value? Why does this make sense?
These simulated statistics show some random variability. What do you think is the (approximate) standard deviation of the simulated differences of sample means? Why?
In our real data, we observed a difference of sample means of

\bar{x}_{Sat} - \bar{x}_{Sun} = 0.03

What is the approximate p-value of our study? (You should “guesstimate” this from the plot, not count dots for an exact answer!)

What do you conclude? (Give a short one-sentence answer to the research question; no need to explain your answer.)

Consider the research question,

Is a table less likely to have smokers at dinner than at lunch?

The following histogram shows the results of simulating data 1000 times from the null distribution.

Of all these simulated statistics, what appears to be the center value? Why does this make sense?
These simulated statistics show some random variability. What do you think is the (approximate) standard deviation of the simulated differences of sample proportions? Why?
In our real data, we observed a difference of proportions of

\hat{p}_{D} - \hat{p}_L = -0.064

What is the approximate p-value of our study?

What do you conclude?

Consider the research question,

Do people tend to give lower percent tips when their total bills are higher?

The following histogram shows the results of simulating data 1000 times from the null distribution.

Of all these simulated statistics, what appears to be the center value? Why does this make sense?
These simulated statistics show some random variability. What do you think is the (approximate) standard deviation of the simulated differences of sample proportions? Why?
In our real data, we observed a sample correlation between total_bill and percent_tip of -0.42.

What is the approximate p-value of our study?

What do you conclude?