STAT100 Final Exam Study Guide: Key Concepts and Applications

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Data Collection and Types of Variables

Identifying Variables in a Data Set

Understanding the types of variables in a data set is foundational in statistics. Variables can be classified as either quantitative (numerical) or qualitative (categorical).

Quantitative Variable: A variable that takes numerical values and for which arithmetic operations make sense (e.g., test scores, age).
Qualitative Variable: A variable that describes categories or groups (e.g., course section, gender).

Example: In a data set for a statistics course, 'Final Exam Score' is quantitative, while 'Course Section' is qualitative.

Organizing and Summarizing Data

Describing Distributions with Graphs

Histograms are used to visualize the distribution of quantitative data. The shape of the histogram provides information about the data's distribution.

Skewed Right: The tail on the right side is longer; mean > median.
Skewed Left: The tail on the left side is longer; mean < median.
Symmetric: Both sides are approximately mirror images; mean ≈ median.

Example: A histogram of bill amounts at a restaurant can show whether most bills are low, high, or centered.

Numerically Summarizing Data

Measures of Center and Spread

Key numerical summaries include the mean, median, and measures of spread such as the standard deviation and interquartile range (IQR).

Mean (μ or x̄): The average of all data values.
Median: The middle value when data are ordered.
Standard Deviation (σ or s): Measures the average distance of data points from the mean.
Interquartile Range (IQR): The range between the first (Q1) and third quartiles (Q3):

Example: For exam scores with Q1 = 79 and Q3 = 94, the IQR is .

Probability and Discrete Probability Distributions

Basic Probability Concepts

Probability quantifies the likelihood of events. For discrete random variables, the probability distribution lists all possible values and their probabilities.

Probability of an Event:
Mean of a Discrete Random Variable:

Example: For a game involving a die and a spinner, the probability distribution can be constructed by listing all possible outcomes and their probabilities.

Empirical Rule (68-95-99.7 Rule)

For data that are approximately normally distributed:

About 68% of data fall within 1 standard deviation of the mean.
About 95% within 2 standard deviations.
About 99.7% within 3 standard deviations.

The Normal Probability Distribution

Standard Normal Distribution and Z-Scores

The normal distribution is a continuous, symmetric distribution characterized by its mean (μ) and standard deviation (σ). Z-scores standardize values:

Z-Score Formula:
Use Z-tables to find probabilities and percentiles.

Example: To find the proportion of students with ACT scores above a certain value, calculate the z-score and use the standard normal table.

Sampling Distributions and Estimation

Sampling Distribution of the Sample Mean

The sampling distribution describes the distribution of a statistic (like the mean) over many samples from the same population.

Standard Error of the Mean:

Confidence Intervals

A confidence interval estimates a population parameter using sample data, providing a range of plausible values.

Confidence Interval for Mean (σ known):
Confidence Interval for Proportion:

Example: For a sample mean IQ score, use the sample mean, population standard deviation, and sample size to construct a confidence interval.

Hypothesis Testing

Formulating and Testing Hypotheses

Hypothesis testing is used to make inferences about population parameters.

Null Hypothesis (H0): The statement being tested, usually a statement of no effect or no difference.
Alternative Hypothesis (Ha): The statement we are seeking evidence for.
Test Statistic: Measures how far the sample statistic is from the null hypothesis value, in standard error units.
P-value: The probability of observing a test statistic as extreme as, or more extreme than, the observed value under H0.
Decision Rule: If p-value < significance level (α), reject H0.

Example: Testing whether the mean SAT-Math score for a sample is greater than the population mean.

Inference on Two Population Parameters

Comparing Two Means

When comparing means from two independent samples, use a two-sample t-test.

Test Statistic for Two Means:

Example: Comparing mean quiz scores between two different teaching programs.

Inference on Categorical Data

Estimating Population Proportions

Sample proportions can be used to estimate population proportions and construct confidence intervals.

Sample Proportion:
Confidence Interval for Proportion: (see above)

Probability Tables and Expected Value

Using Probability Tables

Probability tables summarize the likelihood of different outcomes in a random experiment or game.

Prize Amount	Probability
$50	0.025
$5.00	0.075
$1.00	0.125
$0.50	0.225
$0.00	?

Main Purpose: To calculate the probability of winning at least a certain amount, sum the probabilities for all outcomes at or above that amount.

Example: Probability of winning at least $5.00 is the sum of probabilities for $5.00 and $50 prizes.

Identifying Outliers

1.5*IQR Rule for Outliers

Outliers are values that fall far from the rest of the data. The 1.5*IQR rule is commonly used to identify them.

Lower Fence:
Upper Fence:
Any value below the lower fence or above the upper fence is considered an outlier.

Example: For Q1 = 79, Q3 = 94, IQR = 15, the lower fence is and the upper fence is .

Summary Table: Key Statistical Formulas

Concept	Formula (LaTeX)
Mean
Standard Deviation
Z-Score
Confidence Interval (mean, σ known)
Confidence Interval (proportion)
Test Statistic (one mean)
Test Statistic (two means)

Additional info: This guide covers core concepts from data collection and summarization to probability, normal distributions, estimation, and hypothesis testing, as reflected in the exam questions. Students should be familiar with interpreting tables, calculating probabilities, and applying statistical inference procedures.