Skip to main content
Back

Chapter 13: Introduction to Multiple Regression (Business Statistics)

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Multiple Regression Analysis

Overview

Multiple regression is a statistical technique used to examine the linear relationship between one dependent variable (Y) and two or more independent variables (X1, X2, ..., Xk). It extends simple linear regression by allowing for the simultaneous consideration of several predictors.

  • Purpose: To model and predict the value of a dependent variable based on multiple independent variables.

  • Applications: Widely used in business, economics, social sciences, and natural sciences for forecasting and understanding relationships.

Multiple Regression Model

The general form of the multiple regression model with k independent variables is:

  • Equation:

  • Where:

  • Yi: Dependent variable for observation i

  • β0: Intercept (value of Y when all X's are zero)

  • βj: Slope coefficient for independent variable Xj (effect of Xj on Y, holding other X's constant)

  • εi: Random error for observation i

Model with Two Independent Variables

For two predictors, the model simplifies to:

  • β1: Slope of Y with respect to X1, holding X2 constant

  • β2: Slope of Y with respect to X2, holding X1 constant

Estimating the Regression Equation

The coefficients are estimated using sample data, typically via least squares estimation. The estimated regression equation is:

  • b0: Estimated intercept

  • bj: Estimated slope coefficients

Example: Pie Sales Model

A distributor of frozen dessert pies wants to evaluate factors influencing weekly pie sales. Data collected over 15 weeks includes:

  • Dependent variable: Pie sales (units per week)

  • Independent variables: Price (in $), Advertising (in $100's)

Week

Pie Sales

Price ($)

Advertising ($100s)

1

350

5.50

3.3

2

460

5.50

3.2

...

...

...

...

15

430

7.00

2.0

Additional info: Table truncated for brevity; see original notes for full data.

Interpreting Regression Coefficients

  • Sales = 306.526 - 24.975(Price) + 74.131(Advertising)

  • b1 = -24.975: For each $1 increase in price, sales decrease by 24.975 pies per week, holding advertising constant.

  • b2 = 74.131: For each $100 increase in advertising, sales increase by 74.131 pies per week, holding price constant.

Making Predictions

To predict sales for a given price and advertising level:

  • Predicted sales: 428.62 pies (for price $5.50 and advertising $350)

Confidence and Prediction Intervals

  • Confidence interval: Range for the mean value of Y, given X values

  • Prediction interval: Range for an individual Y value, given X values

Coefficient of Multiple Determination (r2)

Measures the proportion of total variation in Y explained by all X variables together.

  • SSR: Regression sum of squares

  • SST: Total sum of squares

  • Interpretation: In the example, (52.1% of variation in pie sales explained by price and advertising)

Adjusted r2

Adjusted r2 accounts for the number of predictors in the model, penalizing excessive use of unimportant variables.

  • n: Sample size

  • k: Number of independent variables

  • Adjusted r2 is always less than r2

  • Useful for comparing models with different numbers of predictors

Testing Overall Model Significance (F Test)

The F test determines if there is a significant linear relationship between the set of X variables and Y.

  • Hypotheses:

  • : β1 = β2 = ... = βk = 0 (no linear relationship)

  • : At least one βj ≠ 0 (at least one independent variable affects Y)

Test Statistic:

  • Decision rule: If FSTAT > critical value or p-value < α, reject H0

  • Example: FSTAT = 6.5386, p-value = 0.0120 < 0.05 ⇒ model is significant

Residuals in Multiple Regression

Residuals are the differences between observed and predicted values:

  • Purpose: Assess model fit and check assumptions

  • Best fit equation: Minimizes sum of squared errors,

Assumptions of Multiple Regression

  • Errors are normally distributed

  • Errors have constant variance (homoscedasticity)

  • Errors are independent

Residual Plots

  • Residuals vs. Predicted Y

  • Residuals vs. X1

  • Residuals vs. X2

  • Residuals vs. Time (for time series data)

  • Use: Check for violations of regression assumptions (e.g., non-linearity, heteroscedasticity)

Testing Significance of Individual Variables (t Test)

  • Purpose: Test if each independent variable has a significant linear relationship with Y, controlling for other variables

  • Hypotheses:

  • : βj = 0 (no linear relationship between Xj and Y)

  • : βj ≠ 0 (linear relationship exists)

  • Test Statistic:

  • df = n - k - 1

  • Decision: If |tSTAT| > critical value or p-value < α, reject H0

Additional Topics in Multiple Regression

Dummy Variables

Dummy variables are used to include categorical independent variables in regression models.

  • Definition: A variable coded as 0 or 1 to represent two categories (e.g., yes/no, male/female)

  • For more than two categories: Use (number of categories - 1) dummy variables

  • Interpretation: The coefficient of a dummy variable represents the difference in intercept between the two groups

Interaction Terms

Interaction terms allow the effect of one independent variable to depend on the level of another.

  • Model:

  • Interpretation: The effect of X1 on Y changes as X2 changes

  • Testing: If the interaction term is significant, include it in the model

Summary Table: Key Multiple Regression Concepts

Concept

Definition

Formula

Multiple Regression Model

Linear relationship between Y and multiple X's

Coefficient of Determination (r2)

Proportion of variance in Y explained by X's

Adjusted r2

r2 adjusted for number of predictors

F Test

Test overall model significance

t Test

Test significance of individual predictors

Dummy Variable

Categorical variable coded as 0/1

--

Interaction Term

Product of two predictors

Conclusion

Multiple regression is a powerful tool for modeling and predicting outcomes based on several independent variables. Understanding how to build, interpret, and validate these models is essential for effective business decision-making and statistical analysis.

Pearson Logo

Study Prep