Chapter 13: Introduction to Multiple Regression (Business Statistics)

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Multiple Regression Analysis

Overview

Multiple regression is a statistical technique used to examine the linear relationship between one dependent variable (Y) and two or more independent variables (X1, X2, ..., Xk). It extends simple linear regression by allowing for the simultaneous consideration of several predictors.

Purpose: To model and predict the value of a dependent variable based on multiple independent variables.
Applications: Widely used in business, economics, social sciences, and natural sciences for forecasting and understanding relationships.

Multiple Regression Model

The general form of the multiple regression model with k independent variables is:

Equation:

Where:
Yi: Dependent variable for observation i
β0: Intercept (value of Y when all X's are zero)
βj: Slope coefficient for independent variable Xj (effect of Xj on Y, holding other X's constant)
εi: Random error for observation i

Model with Two Independent Variables

For two predictors, the model simplifies to:

β1: Slope of Y with respect to X1, holding X2 constant
β2: Slope of Y with respect to X2, holding X1 constant

Estimating the Regression Equation

The coefficients are estimated using sample data, typically via least squares estimation. The estimated regression equation is:

b0: Estimated intercept
bj: Estimated slope coefficients

Example: Pie Sales Model

A distributor of frozen dessert pies wants to evaluate factors influencing weekly pie sales. Data collected over 15 weeks includes:

Dependent variable: Pie sales (units per week)
Independent variables: Price (in $), Advertising (in $100's)

Week	Pie Sales	Price ($)	Advertising ($100s)
1	350	5.50	3.3
2	460	5.50	3.2
...	...	...	...
15	430	7.00	2.0

Additional info: Table truncated for brevity; see original notes for full data.

Interpreting Regression Coefficients

Sales = 306.526 - 24.975(Price) + 74.131(Advertising)
b1 = -24.975: For each $1 increase in price, sales decrease by 24.975 pies per week, holding advertising constant.
b2 = 74.131: For each $100 increase in advertising, sales increase by 74.131 pies per week, holding price constant.

Making Predictions

To predict sales for a given price and advertising level:

Predicted sales: 428.62 pies (for price $5.50 and advertising $350)

Confidence and Prediction Intervals

Confidence interval: Range for the mean value of Y, given X values
Prediction interval: Range for an individual Y value, given X values

Coefficient of Multiple Determination (r2)

Measures the proportion of total variation in Y explained by all X variables together.

SSR: Regression sum of squares
SST: Total sum of squares
Interpretation: In the example, (52.1% of variation in pie sales explained by price and advertising)

Adjusted r2

Adjusted r2 accounts for the number of predictors in the model, penalizing excessive use of unimportant variables.

n: Sample size
k: Number of independent variables
Adjusted r2 is always less than r2
Useful for comparing models with different numbers of predictors

Testing Overall Model Significance (F Test)

The F test determines if there is a significant linear relationship between the set of X variables and Y.

Hypotheses:
: β1 = β2 = ... = βk = 0 (no linear relationship)
: At least one βj ≠ 0 (at least one independent variable affects Y)

Test Statistic:

Decision rule: If FSTAT > critical value or p-value < α, reject H0
Example: FSTAT = 6.5386, p-value = 0.0120 < 0.05 ⇒ model is significant

Residuals in Multiple Regression

Residuals are the differences between observed and predicted values:

Purpose: Assess model fit and check assumptions
Best fit equation: Minimizes sum of squared errors,

Assumptions of Multiple Regression

Errors are normally distributed
Errors have constant variance (homoscedasticity)
Errors are independent

Residual Plots

Residuals vs. Predicted Y
Residuals vs. X1
Residuals vs. X2
Residuals vs. Time (for time series data)
Use: Check for violations of regression assumptions (e.g., non-linearity, heteroscedasticity)

Testing Significance of Individual Variables (t Test)

Purpose: Test if each independent variable has a significant linear relationship with Y, controlling for other variables
Hypotheses:
: βj = 0 (no linear relationship between Xj and Y)
: βj ≠ 0 (linear relationship exists)
Test Statistic:

df = n - k - 1
Decision: If |tSTAT| > critical value or p-value < α, reject H0

Additional Topics in Multiple Regression

Dummy Variables

Dummy variables are used to include categorical independent variables in regression models.

Definition: A variable coded as 0 or 1 to represent two categories (e.g., yes/no, male/female)
For more than two categories: Use (number of categories - 1) dummy variables
Interpretation: The coefficient of a dummy variable represents the difference in intercept between the two groups

Interaction Terms

Interaction terms allow the effect of one independent variable to depend on the level of another.

Model:
Interpretation: The effect of X1 on Y changes as X2 changes
Testing: If the interaction term is significant, include it in the model

Summary Table: Key Multiple Regression Concepts

Concept	Definition	Formula
Multiple Regression Model	Linear relationship between Y and multiple X's
Coefficient of Determination (r2)	Proportion of variance in Y explained by X's
Adjusted r2	r2 adjusted for number of predictors
F Test	Test overall model significance
t Test	Test significance of individual predictors
Dummy Variable	Categorical variable coded as 0/1	--
Interaction Term	Product of two predictors

Conclusion

Multiple regression is a powerful tool for modeling and predicting outcomes based on several independent variables. Understanding how to build, interpret, and validate these models is essential for effective business decision-making and statistical analysis.