Skip to main content
Back

Multiple Regression: Estimation, Interpretation, and Application

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Multiple Regression Analysis

Introduction to Multiple Regression

Multiple regression is a statistical technique used to model the relationship between a single response variable and two or more explanatory (predictor) variables. This method allows for improved predictions and a better understanding of how several factors simultaneously influence the response variable.

  • Response Variable (Y): The main variable of interest, which we aim to predict or explain (e.g., asking price of homes).

  • Explanatory Variables (X1, X2, X3): Variables used to predict the response (e.g., square footage, number of bedrooms, number of bathrooms).

  • Purpose: To account for more variation in the response variable by including multiple predictors.

Example: Predicting the asking price (in thousands of dollars) of homes in Greenville, SC, using square footage, number of bedrooms, and number of bathrooms as predictors.

Data Table: Home Prices and Features

The following table summarizes the data for 13 homes, including the response and explanatory variables:

Home

Asking Price (Y, $1000s)

Square Footage (X1)

Bedrooms (X2)

Baths (X3)

1

498

3800

4

3.5

2

449

2600

4

3.0

3

435

2600

5

3.5

4

400

2250

4

4.0

5

379

3300

4

3.0

6

375

2750

3

2.5

7

356

2200

3

2.5

8

350

3000

4

2.5

9

340

2300

3

2.0

10

332

2600

4

2.5

11

298

2300

4

2.0

12

280

2000

4

3.0

13

260

2200

3

2.5

Multiple Regression Model and Estimation

The general form of the multiple regression equation is:

  • Regression Equation:

  • Estimated Equation (from data):

  • Interpretation of Coefficients:

    • b1 = 0.0719: For each additional square foot (holding bedrooms and baths constant), the predicted asking price increases by thousand ($71.90).

    • b2 = -0.8: For each additional bedroom (holding square footage and baths constant), the predicted asking price decreases by thousand ($800).

    • b3 = 55.3: For each additional bathroom (holding other variables constant), the predicted asking price increases by thousand.

    • b0 = 25.6: The predicted price when all predictors are zero (not meaningful in this context).

Example: For a home with 3800 sq ft, 4 bedrooms, and 3.5 baths:

Regression Output Summary

Term

Coefficient

SE Coef

T-Value

P-Value

VIF

Constant

25.6

98.2

0.26

0.800

-

Square Footage

0.0719

0.0276

2.61

0.028

1.10

Bedrooms

-0.8

27.3

-0.03

0.977

1.50

Baths

55.3

27.5

2.02

0.075

1.50

  • Interpretation of P-Values: Lower p-values (typically < 0.05) indicate statistical significance. Here, only square footage is significant at the 0.05 level.

  • VIF (Variance Inflation Factor): Indicates multicollinearity; values near 1 suggest low multicollinearity.

Analysis of Variance (ANOVA) Table

Source

DF

Seq SS

Seq MS

F-Value

P-Value

Regression

3

36675

12225

5.72

0.018

Error

9

19247

2139

-

-

Total

12

55921

-

-

-

  • F-Value and P-Value: The overall model is significant (p = 0.018), indicating that at least one predictor is useful.

Model Summary Statistics

S

R-sq

R-sq(adj)

R-sq(pred)

46.2441

65.58%

54.11%

41.13%

  • R-squared (R2): Proportion of variance in the response explained by the predictors. Here, 65.6% of the variability in asking price is explained by the model.

  • Adjusted R-squared: Adjusts for the number of predictors; useful for comparing models with different numbers of predictors.

  • S: Standard error of the regression (estimate of the typical size of residuals).

Correlation Coefficients

  • Pearson correlation between Asking Price and Square Footage: 0.665 (p = 0.013)

  • Correlation with Bedrooms: 0.409

  • Correlation with Baths: 0.626

  • These are simple correlations, not accounting for other variables. In multiple regression, R2 reflects the combined effect.

Fitted Values and Residuals

For each observation, the fitted value (prediction) and residual (difference between observed and predicted) are calculated:

  • Fitted Value (\( \hat{y} \)): The predicted value from the regression equation.

  • Residual (e): The difference between the observed and predicted value.

Example: For house 1, observed price = 498, predicted = 489.17, so residual = 8.83 (in thousands of dollars).

  • Residuals represent unexplained variation, possibly due to omitted variables or random noise.

The Least Squares Principle

The regression coefficients are chosen to minimize the sum of squared residuals (errors):

  • Least Squares Solution: The set of coefficients (b0, b1, b2, b3) that minimizes SSE provides the best fit to the data.

  • For this data, SSE = 19247 (from the ANOVA table).

Interpretation and Application

  • Prediction: The regression equation can be used to predict asking price for any combination of square footage, bedrooms, and baths within the range of the data.

  • Coefficient Interpretation: Each coefficient represents the expected change in the response variable for a one-unit increase in the predictor, holding other variables constant.

  • Intercept Interpretation: The intercept is the predicted value when all predictors are zero; often not meaningful if zero is outside the data range.

  • Comparison to Simple Regression: Coefficients in multiple regression differ from those in simple regression due to the adjustment for other variables.

Summary Table: Key Concepts in Multiple Regression

Concept

Definition/Interpretation

Regression Coefficient

Change in predicted Y for a one-unit increase in X, holding other variables constant

Intercept

Predicted Y when all X's are zero (may not be meaningful)

Residual

Observed Y minus predicted Y

R-squared

Proportion of variance in Y explained by the model

SSE

Sum of squared residuals; minimized in least squares regression

Fitted Value

Predicted value from the regression equation

Additional info: The ANOVA table and inference for individual coefficients will be discussed in a later module. The example demonstrates the process of fitting, interpreting, and applying a multiple regression model using real data and statistical software output.

Pearson Logo

Study Prep