Skip to main content
Back

Chapter 16: Regression – Statistical Prediction and Analysis

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Regression

Introduction to Regression

Regression is a statistical method that builds on correlation to predict the value of one variable based on the value of another. While correlation describes the strength and direction of a relationship, regression provides a predictive equation.

  • Regression predicts outcomes; correlation describes relationships.

Simple Linear Regression

Definition and Purpose

Simple linear regression is a statistical tool used to predict an individual's score on a dependent variable (DV) based on their score on one independent variable (IV).

  • Dependent Variable (DV): The outcome variable being predicted.

  • Independent Variable (IV): The predictor variable used for prediction.

Standardized Regression Equation

Using z Scores in Regression

The standardized regression equation predicts the z score of a dependent variable, , from the z score of an independent variable, . The independent variable's z score is multiplied by the Pearson correlation coefficient to obtain the predicted z score on the dependent variable.

  • Equation:

Regression to the Mean

Concept and Example

Regression to the mean is the tendency for scores that are particularly high or low to drift toward the mean over time. This phenomenon is observed when extreme values become less extreme in subsequent measurements.

z Score for the Predictor Variable, X

Predicted z Score for the Outcome Variable, Y

-2.0

-1.16

-1.0

-0.58

0.0

0.00

1.0

0.58

2.0

1.16

  • If the z score for the IV is not available, convert the raw score to a z score.

  • To obtain the predicted raw score on the DV, use the formula that converts a z score to a raw score.

Regression Example: Stressful Life Events and Depression

Data Table

Individual

Number of stressful life events

Beck Depression Inventory score

1

1

24

2

2

20

3

2

18

4

1

10

5

4

18

6

6

30

7

0

2

8

5

15

9

5

15

10

1

20

Mean

1.900

15.700

Standard deviation

1.663

9.358

Determining the Regression Equation

Equation for a Line

  • The regression equation is:

  • is the raw score on the independent variable.

  • is the predicted raw score on the dependent variable.

  • is the intercept; is the slope.

Regression with z Scores

Steps for Calculation

  • Calculate the z score for the IV.

  • Multiply the z score by the correlation coefficient ().

  • Convert the resulting z score to a raw score.

Formulas:

Steps in Determining the Regression Equation

  1. Find the z score for .

  2. Use the z score to calculate the predicted value.

  3. Convert the z score to its raw score.

Calculating the Slope

  1. Find the z score for of 1.

  2. Use the z score to calculate the predicted score on .

  3. Convert the z score to its raw score.

  4. Find a predicted score.

The Regression Line and Line of Best Fit

Graphical Representation

  • The regression line is the best-fitting straight line through the data points, representing the predicted values of for each value of .

  • The line of best fit minimizes the sum of squared errors between observed and predicted values.

Interpretation and Prediction

Prediction and Error

  • Predictions from regression are subject to error, which is quantified by the standard error of the estimate.

  • Standard error of the estimate: Indicates the typical distance between the regression line and the actual data points.

Proportionate Reduction in Error (Coefficient of Determination)

Definition and Calculation

  • Also called the coefficient of determination ().

  • Quantifies how much more accurate predictions are when using the regression line compared to using the mean.

Formula:

  1. Determine the error using the mean as the predictor.

  2. Determine the error using the regression equation as the predictor.

  3. Subtract the regression error from the mean error.

  4. Divide the difference by the error using the mean.

Calculating Error

Using the Mean as Predictor

Individual

Depression score (Y)

Mean (Ȳ)

Error (Y - Ȳ)

Squared error (Y - Ȳ)2

1

24

15.7

8.3

68.89

2

20

15.7

4.3

18.49

3

18

15.7

2.3

5.29

4

10

15.7

-5.7

32.49

5

18

15.7

2.3

5.29

6

30

15.7

14.3

204.49

7

2

15.7

-13.7

187.69

8

15

15.7

-0.7

0.49

9

15

15.7

-0.7

0.49

10

20

15.7

4.3

18.49

Using the Regression Equation as Predictor

Individual

Depression score (Y)

Predicted (Ŷ)

Error (Y - Ŷ)

Squared error (Y - Ŷ)2

1

24

12.77

11.33

128.33

2

20

16.04

3.96

15.68

3

18

16.04

1.96

3.84

4

10

12.77

-2.77

7.67

5

18

19.31

-1.31

1.72

6

30

25.85

4.15

17.22

7

2

9.50

-7.50

56.25

8

15

22.58

-7.58

57.50

9

15

22.58

-7.58

57.50

10

20

12.77

7.23

52.27

Visualizing Error

  • Error is represented graphically as the vertical distance between observed data points and the regression line or mean.

Limitations of Regression

  • Data are often not from a true experiment (lack of randomization).

  • The independent variable is usually a scale variable.

  • Limitations in predictions are similar to those found in correlation analysis.

Multiple Regression

Definition and Application

Multiple regression is a statistical technique that includes two or more predictor variables in a prediction equation, allowing for more complex and accurate predictions.

  • When calculating proportionate reduction in error for multiple regression, use the symbol instead of .

  • Applications include forecasting, anticipatory shipping, and personalized recommendations in apps and websites.

Example: Multiple Regression Data Table

Individual

Number of stressful life events

Sleep disturbance score

Beck Depression Inventory Score

1

1

12

24

2

2

10

20

3

2

9

18

4

1

7

10

5

4

9

18

6

6

12

30

7

0

3

2

8

5

10

15

9

5

12

15

10

1

8

20

Mean

1.900

8.400

15.700

Standard Deviation

1.663

3.747

9.358

Software Output for Regression

  • Statistical software provides coefficients, standard errors, and significance values for each predictor in a multiple regression model.

  • Capitalization of indicates the use of multiple predictors.

Multiple Regression in Everyday Life

  • Modern applications use multiple regression for forecasting, anticipatory shipping, and personalized shopping experiences.

Additional info: Regression analysis is foundational in both behavioral sciences and natural sciences, including chemistry, for modeling and predicting quantitative relationships between variables.

Pearson Logo

Study Prep