BackResiduals and Residual Plots in Regression Analysis
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Residual Analysis
Residuals and Residual Plots
Residual analysis is a key step in regression modeling, used to assess the goodness of fit of a linear regression model. A residual is the vertical distance between an observed data point and the predicted value from the regression line. A residual plot helps determine whether the linear model is appropriate for the data.
Residual: The difference between the observed value and the predicted value from the regression line.
Residual Plot: A scatterplot of residuals on the vertical axis and the independent variable (or predicted values) on the horizontal axis.
Formula for Residual:
= observed value
= predicted value from the regression line
Example: Ice Cream Sales vs. High Temperature
Consider the following data set, where ice cream sales (in dollars) are recorded against daily high temperature (in °F):
High Temp (°F) | Sales ($) |
|---|---|
62 | 180 |
65 | 200 |
68 | 220 |
70 | 260 |
72 | 300 |
75 | 340 |
78 | 360 |
The regression equation for this data is:
To calculate residuals, substitute each value into the regression equation to get , then subtract from the observed .
Interpreting Residual Plots
If residuals are randomly scattered around zero (no pattern), the linear model is a good fit for the data.
If residuals show a pattern (e.g., curve, increasing or decreasing spread), the linear model is not a good fit.
Examples of Residual Plots
Random scatter: Indicates appropriateness of linear regression.
Patterned residuals: (e.g., U-shape, increasing/decreasing spread) suggest non-linearity or heteroscedasticity; linear regression may not be suitable.
Practice: Identifying Appropriate Models
Given several residual plots, the one with points randomly scattered around zero (no visible pattern) suggests that a linear regression model is appropriate.
Summary Table: Residual Plot Interpretation
Residual Plot Pattern | Model Appropriateness |
|---|---|
Random scatter | Linear model is appropriate |
Curved pattern | Linear model is not appropriate |
Increasing or decreasing spread | Indicates non-constant variance; linear model may not be appropriate |
Key Points
Residuals help diagnose the fit of a regression model.
Random residuals support the use of a linear model.
Patterns in residuals suggest the need for a different model or transformation.
Example: If a residual plot for ice cream sales vs. temperature shows random scatter, the linear model is appropriate. If the plot shows a curve, a nonlinear model may be needed.