Skip to main content
Back

Chapter 22: Regression Diagnostics – Business Statistics Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Regression Diagnostics

Introduction

Regression diagnostics are essential for evaluating the validity and reliability of regression models in business statistics. This chapter focuses on three main issues: changing variation (heteroscedasticity), outliers, and dependent errors (autocorrelation), and provides methods for detecting and addressing these problems.

Changing Variation

Understanding Changing Variation

In regression analysis, the variability of the response variable may change across levels of the explanatory variable. This is particularly evident in cases such as home prices, where larger homes tend to have more variable prices.

  • Heteroscedasticity: Errors have different amounts of variation across levels of the explanatory variable.

  • Homoscedasticity: Errors have equal amounts of variation.

  • Implications: Violating the similar variances condition affects the reliability of confidence intervals and hypothesis tests.

Example: Home Price vs. Size

Scatterplots and residual plots can reveal changing variation. In the home price example, both the mean and standard deviation of price increase with home size.

Scatterplot of Price vs. Square Feet Regression output table for home price example

Detecting Changing Variation

  • Scatterplot: Shows increasing spread of prices with home size.

  • Residual Plot: Fan-shaped pattern indicates heteroscedasticity.

Fan-shaped residual plot

  • Boxplots: Side-by-side boxplots of residuals by size range confirm increasing variance.

Boxplots of residuals by size range

Consequences of Heteroscedasticity

  • Prediction intervals may be too narrow or too wide.

  • Confidence intervals for slope and intercept are unreliable.

  • Hypothesis tests for coefficients may be invalid.

Prediction intervals for home price

Fixing Changing Variation

One solution is to revise the model by transforming the response variable. For example, dividing price by square feet and using the reciprocal of square feet as the explanatory variable can stabilize variance.

  • Transformed Model: Response variable becomes price per square foot; explanatory variable is reciprocal of square feet.

  • Result: Residuals exhibit similar variances (homoscedasticity).

Boxplots of residuals after transformation

Comparing Models

Although the revised model may have a lower , it provides more reliable confidence and prediction intervals.

Table comparing fixed costs Table comparing marginal costs Table comparing prediction intervals

Outliers

Identifying Outliers

Outliers are observations that deviate markedly from the pattern of the data. In regression, outliers can have high leverage, meaning they strongly influence the regression line.

  • Leverage: An observation with an extreme value of the explanatory variable.

  • Impact: Outliers can distort estimates of regression coefficients and prediction intervals.

Example: Contractor's Bid

In a dataset of contractor bids, one project at 900 square feet is an outlier and a leveraged observation.

Scatterplot of contractor data with outlier

Consequences of Outliers

  • Including the outlier shifts the estimated fixed cost and marginal cost by more than one standard error.

  • Prediction intervals change significantly depending on whether the outlier is included.

Regression output with and without outlier Prediction intervals with outlier Prediction intervals without outlier

Handling Outliers

  • Decide whether to include or exclude the outlier based on whether it represents expected future conditions.

  • Gather more information to make an informed decision.

Dependent Errors and Time Series

Detecting Dependence

In time series data, errors may be correlated across time, violating the independence assumption. This is known as autocorrelation.

  • Durbin-Watson Statistic: Tests for autocorrelation in residuals.

  • Null Hypothesis: Adjacent residuals are uncorrelated ().

  • Interpretation: If D is approximately 2, residuals are uncorrelated.

Scatterplot for time series regression Timeplot of residuals showing dependence

Durbin-Watson Statistic

  • Use p-value or critical values to determine if autocorrelation is present.

  • Critical values table helps decide when to reject the null hypothesis.

Durbin-Watson critical values table

Consequences of Dependence

  • Positive autocorrelation leads to underestimated standard errors.

  • Estimated slope and intercept are less precise.

  • Best remedy: Incorporate dependence into the regression model (e.g., using time series models).

Key Terms and Formulas

  • Heteroscedasticity: Unequal variance of errors.

  • Homoscedasticity: Equal variance of errors.

  • Leverage: Influence of an observation on the regression line.

  • Autocorrelation: Correlation of residuals across time.

  • Durbin-Watson Statistic:

Summary Table: Model Comparison

The following tables summarize the comparison between models with and without variance stabilization, and the impact of outliers:

Response

Similar Variances?

Estimated Fixed Cost

95% Confidence Interval Lower

Upper

Price

No

$50,599

$4,000

$105,000

Price/Sq Ft

Yes

$53,887

$19,000

$88,000

Response

Similar Variances?

Estimated Marginal Cost

95% Confidence Interval Lower

Upper

Price

No

$159/Sq Ft

$135/Sq Ft

$183.5/Sq Ft

Price/Sq Ft

Yes

$0.159/Sq Ft

$0.137/Sq Ft

$0.179/Sq Ft

Size (Sq Ft)

Response

Similar Variances?

95% Prediction Interval Lower

Upper

Length

1,000

Price

No

$238,000

$382,000

$144,000

1,000

Price/Sq Ft

Yes

$153,000

$206,000

$53,000

3,000

Price

No

$367,000

$781,000

$414,000

3,000

Price/Sq Ft

Yes

$501,000

$546,000

$45,000

n

D is less than

D is greater than

15

1.36

2.64

20

1.41

2.59

30

1.49

2.51

40

1.54

2.46

50

1.59

2.41

75

1.65

2.35

100

1.69

2.31

Additional info: Academic context and explanations have been expanded for clarity and completeness. All images included are directly relevant to the adjacent content and reinforce key concepts.

Pearson Logo

Study Prep