Skip to main content
Back

Linear Regression: Regression Lines and Prediction in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Correlation and Regression

Linear Regression

Linear regression is a statistical method used to model the relationship between two quantitative variables. After establishing that a significant linear correlation exists between the variables, the next step is to determine the equation of the line that best fits the data, known as the regression line. This line can then be used to predict values of the dependent variable based on values of the independent variable.

  • Regression Line: The line that minimizes the sum of the squares of the residuals (the differences between observed and predicted values).

  • Equation of Regression Line: The regression line for variables x (independent) and y (dependent) is given by: where is the slope and is the y-intercept.

  • Prediction: The regression equation can be used to predict the value of y for any given value of x.

Residuals

Residuals are the differences between the observed y-values and the predicted y-values for each x-value in the data set. They are used to assess the fit of the regression line.

  • Definition: For each data point, the residual is .

  • Interpretation: Residuals can be positive, negative, or zero. The regression line is chosen to minimize the sum of the squares of these residuals.

Finding the Equation of a Regression Line

The equation of the regression line is determined using the means and sums of the x and y values in the data set. The line always passes through the point , where and are the means of the x and y values, respectively.

  • Slope (m):

  • Y-intercept (b):

  • Regression Line Equation:

Example: Finding the Regression Line

Suppose we have data on gross domestic products (GDP) and carbon dioxide emissions. After verifying a significant linear correlation, we can use the formulas above to calculate the slope and intercept, and thus the regression equation. For example, with data points, the regression line might be:

  • (example values)

Graphing the Regression Line

To graph the regression line:

  1. Choose two x-values within the range of the data.

  2. Calculate the corresponding y-values using the regression equation.

  3. Draw a straight line through these two points. The line will pass through .

Using Technology to Find a Regression Equation

Statistical software and calculators can quickly compute the regression equation from a data set. For example, inputting geyser eruption data into a calculator or software will output the regression equation, which can then be used for prediction.

Predicting y-Values Using Regression Equations

Once the regression equation is known, it can be used to predict y-values for given x-values. For example, if the regression equation for GDP (in trillions of dollars) and carbon dioxide emissions (in millions of metric tons) is:

To predict emissions for a GDP of trillion:

  • Substitute into the equation: million metric tons.

Repeat for other GDP values as needed.

Summary Table: Steps in Linear Regression

Step

Description

1. Verify Correlation

Check if a significant linear correlation exists between x and y.

2. Calculate Means

Find and .

3. Compute Slope (m)

Use the formula for m.

4. Compute Intercept (b)

Use the formula for b.

5. Write Regression Equation

Form .

6. Predict Values

Substitute x-values to predict y.

Elementary Statistics textbook cover

Additional info: The image included is the cover of the textbook 'Elementary Statistics: Picturing the World' by Ron Larson, which is directly relevant as it is the source of the material and provides context for the study notes.

Pearson Logo

Study Prep