Statistics Exam 2 Study Guide: Regression, Sampling, Probability, and Confidence Intervals

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Linear Regression

Regression Equations and Interpretation

Linear regression is a statistical method used to model the relationship between two quantitative variables. The regression equation allows us to predict the value of one variable based on the value of another.

Regression Equation: The general form is , where is the intercept and is the slope.
Interpretation of Slope: The slope represents the expected change in for a one-unit increase in .
Prediction: Use the regression equation to predict values for given values.

Assessing the Fit of Regression

Evaluating how well the regression model fits the data is crucial for understanding the strength and reliability of the relationship.

Correlation Coefficient (): Measures the strength and direction of the linear relationship between variables. Values range from -1 to 1.
Coefficient of Determination (): Represents the proportion of variability in explained by .
Residuals: The differences between observed and predicted values. Analyzing residuals helps assess model fit and identify outliers.
Interpretation of : Higher values indicate a better fit; for example, means 80% of the variability in is explained by .

Example

Given data on students' study hours () and exam scores (), a regression equation can predict exam scores based on study hours.

Sampling

Populations vs. Samples

Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population.

Population: The entire group of interest.
Sample: A subset of the population used to make inferences.
Sampling Methods: Simple random sampling, stratified sampling, cluster sampling, etc.
Sampling Error: The difference between sample statistics and population parameters due to random variation.

Sampling Bias and Representativeness

Bias: Systematic error that leads to inaccurate estimates.
Representative Sample: A sample that accurately reflects the population.
Example: Surveying only morning students may not represent all students.

Observational Studies vs. Experiments

Key Differences

Understanding the distinction between observational studies and experiments is essential for interpreting research findings.

Observational Study: Researchers observe subjects without intervention.
Experiment: Researchers manipulate variables to observe effects.
Confounding Variables: Factors that may affect the outcome and are not controlled for.
Randomization: Assigning subjects to groups randomly to reduce bias.

Example

Observational: Studying the link between smoking and lung cancer by observing smokers and non-smokers.
Experimental: Testing a new drug by randomly assigning patients to treatment and control groups.

Probability

Basic Probability Concepts

Probability quantifies the likelihood of events occurring and is foundational for statistical inference.

Law of Large Numbers: As the number of trials increases, the sample mean approaches the population mean.
Probability Rules: Addition Rule (for mutually exclusive events), Multiplication Rule (for independent events).
Mutually Exclusive Events: Events that cannot occur together.
Independent Events: The occurrence of one event does not affect the probability of another.

Example

Flipping a coin: Probability of heads is .
Rolling a die: Probability of rolling a 4 is .

Sampling Distributions and Confidence Intervals for Proportions

Sampling Distribution of a Proportion

The sampling distribution describes the distribution of sample proportions over repeated samples from the same population.

Mean of Sampling Distribution: (population proportion).
Standard Error: where is the sample size.
Normal Approximation: For large , the sampling distribution of the sample proportion is approximately normal.

Calculating Probabilities Using Sampling Distributions

Use the normal model to estimate probabilities for sample proportions.
Example: Probability that more than 55 heads occur in 75 coin flips.

Confidence Intervals for Proportions

Confidence intervals provide a range of plausible values for a population proportion based on sample data.

Formula:
Margin of Error: The term quantifies uncertainty.
Effect of Confidence Level and Sample Size: Higher confidence level or smaller sample size increases the width of the interval.

Example

Estimating the proportion of voters who approve of a new law based on a sample.