Skip to main content
Back

Section 9.2 – Linear Regression: Regression Lines, Residuals, and Prediction

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Linear Regression

Introduction to Linear Regression

Linear regression is a statistical method used to model the relationship between two quantitative variables by fitting a straight line (the regression line) to the observed data. The goal is to minimize the differences between the observed values and the values predicted by the line.

  • Regression line (line of best fit): A straight line that best describes how the response variable y changes as the explanatory variable x changes.

  • Equation of the regression line:

  • ŷ (y-hat): Predicted value of y for a given x-value.

  • m: Slope of the line (rate of change of y with respect to x).

  • b: y-intercept (value of y when x = 0).

Scatterplot with regression line, observed y-values, and predicted y-values

Additional info: The regression line is determined using all data points, not just two as in algebra. It is used for prediction only when there is a significant correlation between x and y.

Residuals

Residuals measure the vertical distance between the observed y-values and the predicted y-values on the regression line. They indicate how well the regression line fits the data.

  • Residual (d):

  • A small residual means the prediction is accurate; a large residual means the prediction is less accurate.

Diagram showing fitted values and residuals on a regression line

Additional info: Residuals are used to assess the goodness of fit of the regression model. The best-fitting line minimizes the sum of the squared residuals.

Interpreting the Regression Line

The regression line is meaningful only when the correlation between x and y is strong or significant. The sign of the slope (m) indicates the direction of the relationship:

  • Positive slope: Indicates a positive linear correlation (as x increases, y increases).

  • Negative slope: Indicates a negative linear correlation (as x increases, y decreases).

Predictions using the regression line are only meaningful for x-values within the range of the original data (interpolation). Predictions outside this range (extrapolation) are generally unreliable.

Formulas for the Regression Line

To find the regression line, follow these steps:

  1. Find the slope (m):

  1. Find the y-intercept (b):

  1. Write the regression equation:

When substituting values for m and b, round to the thousandths place.

Using the Regression Equation for Prediction

To predict the value of y for a given x-value, substitute x into the regression equation. Ensure that the x-value is within the range of the original data for the prediction to be meaningful.

  • If x is outside the range, state: "It is not meaningful to predict the value of y for x = (value) because x = (value) is outside the range of the original data."

Worked Example: Regression Line Calculation

Given data: Heights (x) and number of stories (y) for nine buildings in Atlanta.

Find the slope (m):

Find the y-intercept (b):

Regression equation:

Prediction Example

  • For x = 800 feet: (meaningful, since 800 is within the data range)

  • For x = 400 feet: Not meaningful, since 400 is outside the data range (486 to 869).

Correlation Coefficient (from Section 9.1)

The sample correlation coefficient r measures the strength and direction of the linear relationship between x and y.

Interpretation: r close to 1 or -1 indicates a strong linear relationship; r close to 0 indicates a weak relationship.

Summary Table: Steps for Finding the Regression Line

Step

Description

Formula

1

Find the slope (m)

2

Find the y-intercept (b)

3

Write the regression equation

4

Predict y for a given x

Substitute x into

Pearson Logo

Study Prep