BackChapter 13: Introduction to Multiple Regression (Business Statistics)
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Multiple Regression Analysis
Overview
Multiple regression is a statistical technique used to examine the linear relationship between one dependent variable (Y) and two or more independent variables (X1, X2, ..., Xk). It extends simple linear regression by allowing for the simultaneous consideration of several predictors.
Purpose: To model and predict the value of a dependent variable based on multiple independent variables.
Applications: Widely used in business, economics, social sciences, and natural sciences for forecasting and understanding relationships.
Multiple Regression Model
The general form of the multiple regression model with k independent variables is:
Equation:
Where:
Yi: Dependent variable for observation i
β0: Intercept (value of Y when all X's are zero)
βj: Slope coefficient for independent variable Xj (effect of Xj on Y, holding other X's constant)
εi: Random error for observation i
Model with Two Independent Variables
For two predictors, the model simplifies to:
β1: Slope of Y with respect to X1, holding X2 constant
β2: Slope of Y with respect to X2, holding X1 constant
Estimating the Regression Equation
The coefficients are estimated using sample data, typically via least squares estimation. The estimated regression equation is:
b0: Estimated intercept
bj: Estimated slope coefficients
Example: Pie Sales Model
A distributor of frozen dessert pies wants to evaluate factors influencing weekly pie sales. Data collected over 15 weeks includes:
Dependent variable: Pie sales (units per week)
Independent variables: Price (in $), Advertising (in $100's)
Week | Pie Sales | Price ($) | Advertising ($100s) |
|---|---|---|---|
1 | 350 | 5.50 | 3.3 |
2 | 460 | 5.50 | 3.2 |
... | ... | ... | ... |
15 | 430 | 7.00 | 2.0 |
Additional info: Table truncated for brevity; see original notes for full data.
Interpreting Regression Coefficients
Sales = 306.526 - 24.975(Price) + 74.131(Advertising)
b1 = -24.975: For each $1 increase in price, sales decrease by 24.975 pies per week, holding advertising constant.
b2 = 74.131: For each $100 increase in advertising, sales increase by 74.131 pies per week, holding price constant.
Making Predictions
To predict sales for a given price and advertising level:
Predicted sales: 428.62 pies (for price $5.50 and advertising $350)
Confidence and Prediction Intervals
Confidence interval: Range for the mean value of Y, given X values
Prediction interval: Range for an individual Y value, given X values
Coefficient of Multiple Determination (r2)
Measures the proportion of total variation in Y explained by all X variables together.
SSR: Regression sum of squares
SST: Total sum of squares
Interpretation: In the example, (52.1% of variation in pie sales explained by price and advertising)
Adjusted r2
Adjusted r2 accounts for the number of predictors in the model, penalizing excessive use of unimportant variables.
n: Sample size
k: Number of independent variables
Adjusted r2 is always less than r2
Useful for comparing models with different numbers of predictors
Testing Overall Model Significance (F Test)
The F test determines if there is a significant linear relationship between the set of X variables and Y.
Hypotheses:
: β1 = β2 = ... = βk = 0 (no linear relationship)
: At least one βj ≠ 0 (at least one independent variable affects Y)
Test Statistic:
Decision rule: If FSTAT > critical value or p-value < α, reject H0
Example: FSTAT = 6.5386, p-value = 0.0120 < 0.05 ⇒ model is significant
Residuals in Multiple Regression
Residuals are the differences between observed and predicted values:
Purpose: Assess model fit and check assumptions
Best fit equation: Minimizes sum of squared errors,
Assumptions of Multiple Regression
Errors are normally distributed
Errors have constant variance (homoscedasticity)
Errors are independent
Residual Plots
Residuals vs. Predicted Y
Residuals vs. X1
Residuals vs. X2
Residuals vs. Time (for time series data)
Use: Check for violations of regression assumptions (e.g., non-linearity, heteroscedasticity)
Testing Significance of Individual Variables (t Test)
Purpose: Test if each independent variable has a significant linear relationship with Y, controlling for other variables
Hypotheses:
: βj = 0 (no linear relationship between Xj and Y)
: βj ≠ 0 (linear relationship exists)
Test Statistic:
df = n - k - 1
Decision: If |tSTAT| > critical value or p-value < α, reject H0
Additional Topics in Multiple Regression
Dummy Variables
Dummy variables are used to include categorical independent variables in regression models.
Definition: A variable coded as 0 or 1 to represent two categories (e.g., yes/no, male/female)
For more than two categories: Use (number of categories - 1) dummy variables
Interpretation: The coefficient of a dummy variable represents the difference in intercept between the two groups
Interaction Terms
Interaction terms allow the effect of one independent variable to depend on the level of another.
Model:
Interpretation: The effect of X1 on Y changes as X2 changes
Testing: If the interaction term is significant, include it in the model
Summary Table: Key Multiple Regression Concepts
Concept | Definition | Formula |
|---|---|---|
Multiple Regression Model | Linear relationship between Y and multiple X's | |
Coefficient of Determination (r2) | Proportion of variance in Y explained by X's | |
Adjusted r2 | r2 adjusted for number of predictors | |
F Test | Test overall model significance | |
t Test | Test significance of individual predictors | |
Dummy Variable | Categorical variable coded as 0/1 | -- |
Interaction Term | Product of two predictors |
Conclusion
Multiple regression is a powerful tool for modeling and predicting outcomes based on several independent variables. Understanding how to build, interpret, and validate these models is essential for effective business decision-making and statistical analysis.