BackStatistics Midterm 2 Review: Probability, Sampling, Regression, and Applications
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Probability and Contingency Tables
Analyzing Categorical Data with Contingency Tables
Contingency tables are used to summarize the relationship between two categorical variables. They display the frequency distribution of variables and help in calculating probabilities and testing independence.
Definition: A contingency table is a matrix that displays the frequency of different combinations of two categorical variables.
Example: The table below shows AP Statistics students grouped by whether they ate breakfast and their gender.
Left | Right | Total | |
|---|---|---|---|
Guy | 14 | 6 | 20 |
Girl | 10 | 14 | 24 |
Total | 24 | 20 | 44 |
Calculating Probabilities: To find the probability that a randomly selected student is a girl who ate breakfast, use .
Independence: Two variables are independent if .
Probability Rules and Binomial Probability
Basic Probability Concepts
Probability quantifies the likelihood of events occurring. The sum of probabilities for all possible outcomes is 1.
Key Formula: For independent events, .
Binomial Probability: Used when there are a fixed number of independent trials, each with the same probability of success.
Binomial Probability Formula:
Example: Probability that at least one of two students did not eat breakfast: .
Sampling Distributions and Central Limit Theorem
Sampling and Standard Deviation
Sampling distributions describe the distribution of a statistic (like the mean) from repeated samples. The Central Limit Theorem (CLT) states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.
Standard Deviation of the Sum: For independent random variables and , , so .
CLT Formula: If are independent and identically distributed with mean and standard deviation , then the sample mean has mean and standard deviation .
Application: Used to estimate probabilities about sample means and sums.
Linear Regression
Regression Analysis and Model Fitting
Linear regression models the relationship between a dependent variable and one or more independent variables. The fitted regression equation predicts the value of the dependent variable.
Regression Equation:
Example: Predicting salary from college GPA:
Interpretation: The slope represents the change in salary for each unit increase in GPA.
Conditional Probability and Independence
Conditional Probability
Conditional probability is the probability of an event given that another event has occurred.
Formula:
Independence: Events and are independent if .
Example: Probability that an accident involved alcohol but not speeding:
Expected Value and Variance
Calculating Expected Value and Variance
Expected value is the mean of a random variable's probability distribution. Variance measures the spread.
Expected Value:
Variance:
Example: Expected number of repairs per year:
Normal Approximation to the Binomial
Using the Normal Model for Binomial Probabilities
When the number of trials is large and is not too close to 0 or 1, the binomial distribution can be approximated by a normal distribution.
Normal Approximation:
Application: Used to estimate probabilities for large sample sizes.
Decision Trees and Probability Models
Tree Diagrams for Sequential Events
Tree diagrams help visualize and calculate probabilities for multi-stage events.
Example: Calculating the probability of success in a strategic decision (e.g., attack vs. not attack) using a tree diagram.
Summary Table: Key Probability Formulas
Concept | Formula | Application |
|---|---|---|
Binomial Probability | Probability of k successes in n trials | |
Expected Value | Mean of a random variable | |
Variance (Sum) | Variance of sum of independent variables | |
Conditional Probability | Probability of A given B | |
Regression Equation | Predicting y from x | |
CLT (Sample Mean) | Distribution of sample mean |
Applications and Examples
Blood Drive: Calculating the probability that at least one of the first 20 donors has Type B blood using binomial and normal approximation.
Repair Service: Finding expected number of repairs and variance over multiple years.
Regression: Predicting salary from GPA and evaluating model fit.
Additional info: Some explanations and formulas have been expanded for clarity and completeness. All major topics align with college-level statistics curriculum, including probability, sampling, regression, and applications.