Skip to main content
Back

Study Guide: Regression, Probability, and the Normal Model in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 4 – Regression Analysis: Exploring Associations between Variables

Section 4.1 – Visualizing Variability with a Scatterplot

Scatterplots are essential tools for visualizing the relationship between two quantitative variables. They help identify patterns, trends, and possible associations.

  • Scatterplot: A graph of paired data points (x, y) for two variables.

  • Association or Trend:

    • Positive: As one variable increases, the other tends to increase.

    • Negative: As one variable increases, the other tends to decrease.

    • None: No discernible pattern.

  • Form (Shape):

    • Linear: Points roughly follow a straight line.

    • Nonlinear: Points follow a curved or other non-linear pattern.

Example: Plotting students' study hours (x) against exam scores (y) to see if more study time is associated with higher scores.

Section 4.2 – Measuring Strength of Association with Correlation

The correlation coefficient quantifies the strength and direction of a linear relationship between two variables.

  • Correlation Coefficient (r):

    • Always look at the scatterplot to interpret correlation.

    • Correlation does not mean causation.

    • close to 0: No linear association.

    • close to 1 or -1: Strong linear association (positive or negative).

    • Changing the order of variables does not change .

    • Changing units (multiplying by a positive constant) does not affect .

    • is unitless (no units attached).

Example: If , there is a strong positive linear relationship between the two variables.

Section 4.3 – Modeling Linear Trends

Linear regression models the relationship between an explanatory variable and a response variable using a straight line.

  • Regression Line: The best-fit line through the data points in a scatterplot.

  • Intercept (): The predicted value of the response variable when the explanatory variable is zero.

  • Slope (): Indicates how much the response variable changes for each one-unit increase in the explanatory variable.

  • Explanatory Variable: Also called predictor, independent variable, or x-value.

  • Response Variable: Also called predicted, dependent variable, or y-value.

Regression Equation:

Example: Predicting house prices (y) based on square footage (x).

Section 4.4 – Evaluating the Linear Model

Evaluating a regression model involves assessing its fit and understanding its limitations.

  • Correlation is not causation: A strong correlation does not imply that one variable causes the other.

  • Coefficient of Determination (): Measures the proportion of variance in the response variable explained by the explanatory variable. (Note: Students are instructed to skip calculation and interpretation of for this course.)

  • Goodness of Fit: Indicates how well the regression line represents the data.

Example: A regression model with explains 85% of the variability in the response variable. (For this course, calculation and interpretation of are not required.)

Chapter 5 – Modeling Variation with Probability

Section 5.1 – What is Randomness?

Probability theory models the uncertainty and variation in random phenomena.

  • Experiment: A repeatable process with uncertain outcomes.

  • Random: Outcomes are unpredictable in the short run but follow a pattern in the long run.

  • Simulations: Imitate random processes to estimate probabilities.

  • Probability:

    • Theoretical: Based on mathematical reasoning.

    • Empirical: Based on observed data.

Example: Flipping a coin is a random experiment; the probability of heads is 0.5.

Section 5.2 – Finding Theoretical Probabilities

Theoretical probability uses mathematical models to determine the likelihood of events.

  • Complement: The probability that an event does not occur.

  • Sample Space: The set of all possible outcomes.

  • Event: A subset of the sample space.

  • OR (Union): Probability that at least one of several events occurs.

  • Mutually Exclusive Events: Events that cannot occur together.

  • Venn Diagram: Visual tool for representing events and their relationships.

Example: Rolling a die: sample space = {1,2,3,4,5,6}; event = rolling an even number.

Section 5.3 – Associations in Categorical Variables

Probability rules help analyze relationships between categorical variables.

  • Conditional Probabilities: Probability of one event given another has occurred.

  • Independent Events: Occurrence of one event does not affect the probability of the other.

  • Contingency Table: Table showing frequencies for combinations of categorical variables.

  • Multiplication Rule: For independent events, .

Example: Probability of drawing two aces in a row from a deck (without replacement).

Section 5.4 – Finding Empirical Probabilities

Empirical probability is based on observed data from experiments or simulations.

  • Law of Large Numbers: As the number of trials increases, empirical probability approaches theoretical probability.

  • Streaks: Sequences of similar outcomes (e.g., several heads in a row).

Example: Observing the proportion of heads in 1,000 coin tosses.

Chapter 6 – Modeling Random Events: The Normal Model

Section 6.1 – Probability Distributions are Models of Random Phenomena

Probability distributions describe how probabilities are distributed over possible outcomes.

  • Probability Model: Mathematical description of a random process.

  • Probability Distribution: Lists or functions showing probabilities for all possible outcomes.

  • Probability Distribution Function (pdf): Function that assigns probabilities to outcomes.

  • Discrete Outcomes: Outcomes that can be counted (e.g., number of heads).

  • Continuous Outcomes: Outcomes that can take any value in an interval (e.g., height).

  • Probability Density Curve: Curve representing the distribution of a continuous random variable.

  • Area Under Curves: Represents probability for continuous distributions.

Example: The probability of a randomly selected person being between 160 and 170 cm tall is the area under the normal curve between those values.

Section 6.2 – The Normal Model

The normal model is a continuous probability distribution that is symmetric and bell-shaped, commonly used in statistics.

  • Normal Model: Also called the normal distribution; describes many natural phenomena.

  • Normal Curve (Distribution): The graphical representation of the normal model.

  • Mean (): The center of the distribution.

  • Standard Deviation (): Measures the spread of the distribution.

  • Percentile: The value below which a given percentage of observations fall.

Normal Distribution Formula:

Example: Heights of adult men are approximately normally distributed with mean 70 inches and standard deviation 3 inches.

Pearson Logo

Study Prep