BackChapter 16: Regression – Statistical Prediction and Analysis
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Regression
Introduction to Regression
Regression is a statistical method that builds on correlation to predict the value of one variable based on the value of another. While correlation describes the strength and direction of a relationship, regression provides a predictive equation.
Regression predicts outcomes; correlation describes relationships.
Simple Linear Regression
Definition and Purpose
Simple linear regression is a statistical tool used to predict an individual's score on a dependent variable (DV) based on their score on one independent variable (IV).
Dependent Variable (DV): The outcome variable being predicted.
Independent Variable (IV): The predictor variable used for prediction.
Standardized Regression Equation
Using z Scores in Regression
The standardized regression equation predicts the z score of a dependent variable, , from the z score of an independent variable, . The independent variable's z score is multiplied by the Pearson correlation coefficient to obtain the predicted z score on the dependent variable.
Equation:
Regression to the Mean
Concept and Example
Regression to the mean is the tendency for scores that are particularly high or low to drift toward the mean over time. This phenomenon is observed when extreme values become less extreme in subsequent measurements.
z Score for the Predictor Variable, X | Predicted z Score for the Outcome Variable, Y |
|---|---|
-2.0 | -1.16 |
-1.0 | -0.58 |
0.0 | 0.00 |
1.0 | 0.58 |
2.0 | 1.16 |
If the z score for the IV is not available, convert the raw score to a z score.
To obtain the predicted raw score on the DV, use the formula that converts a z score to a raw score.
Regression Example: Stressful Life Events and Depression
Data Table
Individual | Number of stressful life events | Beck Depression Inventory score |
|---|---|---|
1 | 1 | 24 |
2 | 2 | 20 |
3 | 2 | 18 |
4 | 1 | 10 |
5 | 4 | 18 |
6 | 6 | 30 |
7 | 0 | 2 |
8 | 5 | 15 |
9 | 5 | 15 |
10 | 1 | 20 |
Mean | 1.900 | 15.700 |
Standard deviation | 1.663 | 9.358 |
Determining the Regression Equation
Equation for a Line
The regression equation is:
is the raw score on the independent variable.
is the predicted raw score on the dependent variable.
is the intercept; is the slope.
Regression with z Scores
Steps for Calculation
Calculate the z score for the IV.
Multiply the z score by the correlation coefficient ().
Convert the resulting z score to a raw score.
Formulas:
Steps in Determining the Regression Equation
Find the z score for .
Use the z score to calculate the predicted value.
Convert the z score to its raw score.
Calculating the Slope
Find the z score for of 1.
Use the z score to calculate the predicted score on .
Convert the z score to its raw score.
Find a predicted score.
The Regression Line and Line of Best Fit
Graphical Representation
The regression line is the best-fitting straight line through the data points, representing the predicted values of for each value of .
The line of best fit minimizes the sum of squared errors between observed and predicted values.
Interpretation and Prediction
Prediction and Error
Predictions from regression are subject to error, which is quantified by the standard error of the estimate.
Standard error of the estimate: Indicates the typical distance between the regression line and the actual data points.
Proportionate Reduction in Error (Coefficient of Determination)
Definition and Calculation
Also called the coefficient of determination ().
Quantifies how much more accurate predictions are when using the regression line compared to using the mean.
Formula:
Determine the error using the mean as the predictor.
Determine the error using the regression equation as the predictor.
Subtract the regression error from the mean error.
Divide the difference by the error using the mean.
Calculating Error
Using the Mean as Predictor
Individual | Depression score (Y) | Mean (Ȳ) | Error (Y - Ȳ) | Squared error (Y - Ȳ)2 |
|---|---|---|---|---|
1 | 24 | 15.7 | 8.3 | 68.89 |
2 | 20 | 15.7 | 4.3 | 18.49 |
3 | 18 | 15.7 | 2.3 | 5.29 |
4 | 10 | 15.7 | -5.7 | 32.49 |
5 | 18 | 15.7 | 2.3 | 5.29 |
6 | 30 | 15.7 | 14.3 | 204.49 |
7 | 2 | 15.7 | -13.7 | 187.69 |
8 | 15 | 15.7 | -0.7 | 0.49 |
9 | 15 | 15.7 | -0.7 | 0.49 |
10 | 20 | 15.7 | 4.3 | 18.49 |
Using the Regression Equation as Predictor
Individual | Depression score (Y) | Predicted (Ŷ) | Error (Y - Ŷ) | Squared error (Y - Ŷ)2 |
|---|---|---|---|---|
1 | 24 | 12.77 | 11.33 | 128.33 |
2 | 20 | 16.04 | 3.96 | 15.68 |
3 | 18 | 16.04 | 1.96 | 3.84 |
4 | 10 | 12.77 | -2.77 | 7.67 |
5 | 18 | 19.31 | -1.31 | 1.72 |
6 | 30 | 25.85 | 4.15 | 17.22 |
7 | 2 | 9.50 | -7.50 | 56.25 |
8 | 15 | 22.58 | -7.58 | 57.50 |
9 | 15 | 22.58 | -7.58 | 57.50 |
10 | 20 | 12.77 | 7.23 | 52.27 |
Visualizing Error
Error is represented graphically as the vertical distance between observed data points and the regression line or mean.
Limitations of Regression
Data are often not from a true experiment (lack of randomization).
The independent variable is usually a scale variable.
Limitations in predictions are similar to those found in correlation analysis.
Multiple Regression
Definition and Application
Multiple regression is a statistical technique that includes two or more predictor variables in a prediction equation, allowing for more complex and accurate predictions.
When calculating proportionate reduction in error for multiple regression, use the symbol instead of .
Applications include forecasting, anticipatory shipping, and personalized recommendations in apps and websites.
Example: Multiple Regression Data Table
Individual | Number of stressful life events | Sleep disturbance score | Beck Depression Inventory Score |
|---|---|---|---|
1 | 1 | 12 | 24 |
2 | 2 | 10 | 20 |
3 | 2 | 9 | 18 |
4 | 1 | 7 | 10 |
5 | 4 | 9 | 18 |
6 | 6 | 12 | 30 |
7 | 0 | 3 | 2 |
8 | 5 | 10 | 15 |
9 | 5 | 12 | 15 |
10 | 1 | 8 | 20 |
Mean | 1.900 | 8.400 | 15.700 |
Standard Deviation | 1.663 | 3.747 | 9.358 |
Software Output for Regression
Statistical software provides coefficients, standard errors, and significance values for each predictor in a multiple regression model.
Capitalization of indicates the use of multiple predictors.
Multiple Regression in Everyday Life
Modern applications use multiple regression for forecasting, anticipatory shipping, and personalized shopping experiences.
Additional info: Regression analysis is foundational in both behavioral sciences and natural sciences, including chemistry, for modeling and predicting quantitative relationships between variables.