Unit 3 Glossary

Vocabulary Words

  • 95% confidence interval: A range of values for the parameter that we are 95% confident could have produced our observed data.

  • The Central Limit Theorem (CLT): The math fact that says that if a sample mean (or proportion or correlation) is calculated from a large number of samples, it is generated by a Normal distribution.

  • The Standard Normal Distribution: The Bell Curve with a mean of 0 and a standard deviation of 1.

  • confidence level: The choice of how much chance we are willing to take that we got an unlucky sample.

  • inconsistent: Data that would be unlikely to occur if the null hypothesis were true; i.e., the statistic results in a small p-value.

  • least-squares estimate: The sample mean is the least-squares estimate of the true mean, because it is the number that has the smallest sum of squared error to the observed values.

  • null distribution: The collection of values of a statistic that could have been observed by chance if the null hypothesis were true.

  • null hypothesis: A claim about the parameters that would be true if no interesting trends were present.

  • p-value: the probability of getting a statistic more extreme than what we saw, in a world where the null hypothesis is true.

  • random noise: The random amount that a generated quantitative value falls from the mean.

  • reasonably large number of samples: A general rule of thumb is that more than 30 samples is sufficient, or more than 15 if your original data is not very skewed.

  • residuals: The observed value minus the true mean. Also called the ‘random noise’ or the ‘deviation from the mean’.

  • standardized: The process of subtracting a mean and dividing by a standard deviation; i.e., calculating a z-score.

  • statistic model: A process that generates data with some randomness.

  • sum of squared error (SSE): The total of the squared distances from the observed values to the proposed mean.

  • sum of squared residuals (SSR): The total of the squared distances from the observed values to the proposed mean.

  • tails: The parts of the Bell Curve at either end.

Key Skills and Concepts

  • State a null hypothesis in symbols or specific words
  • Estimate how uncommon a particular observed statistic is, among values simulated from the null distribution.
  • quickly estimate areas under the Bell Curve without the help of technology