Skip to main content
Back

Chapter 23: Multiple Regression – Business Statistics Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Multiple Regression Model

Introduction to Multiple Regression

Multiple regression is a statistical technique used to model the relationship between a response variable and two or more explanatory variables. It allows researchers to separate the effects of each explanatory variable and determine which variables significantly impact the response.

  • Definition: The multiple regression model (MRM) describes the association in the population between multiple explanatory variables and a response.

  • Equation: The response variable Y is linearly related to k explanatory variables X1, X2, ..., Xk by the equation: where (errors are independent, have equal variance, and are normally distributed).

  • Comparison: Simple regression bundles all but one explanatory variable into the error term, while multiple regression includes several variables in the model.

  • Residuals: Departures from normality in residuals may indicate omitted explanatory variables.

Interpreting Multiple Regression

Example: Women’s Apparel Stores

This example models annual sales per square foot in a chain of women’s apparel stores using two explanatory variables: median household income in the area and the number of competing apparel stores in the same mall.

  • Scatterplot Matrix: Used to visualize relationships between variables before fitting the model.

  • Correlation Matrix: Quantifies the strength and direction of linear relationships between variables.

JMP menu for multivariate methodsJMP Pro variable selection for multivariate analysisScatterplot matrix for sales, income, competitorsCorrelation matrix for sales, income, competitors

Fitting the Model

To fit the model, select the response and explanatory variables in statistical software.

JMP Pro model specification for multiple regression

Model Equation and Interpretation

  • Fitted Model:

  • R-squared (): Indicates the proportion of variance in sales explained by the model. Here, (59.47%).

  • Adjusted R-squared: Adjusts for sample size and number of variables; always less than .

  • Standard error (): Measures the average distance that the observed values fall from the regression line. Here, .

R-squared, adjusted R-squared, standard error, sample size

Calibration Plot

A calibration plot compares observed sales to fitted values. The tighter the data cluster along the diagonal, the higher the .

Calibration plot for sales vs. estimated sales

Marginal and Partial Slopes

  • Partial Slope: The slope of an explanatory variable in multiple regression, controlling for other variables.

  • Marginal Slope: The slope in simple regression, not controlling for other variables.

  • Interpretation: Partial and marginal slopes only agree when explanatory variables are uncorrelated.

Partial slopes table for women's apparel stores

Path Diagram and Collinearity

  • Path Diagram: Schematic drawing showing direct and indirect effects among variables.

  • Collinearity: High correlations among explanatory variables can make estimates unreliable.

Path diagram for income, competition, and sales

Checking Conditions for Inference

Residual Plots

Residual plots are used to check model assumptions: independence, equal variance, and normality of errors.

  • Residuals vs. Fitted Values: Used to identify outliers and check for equal variance.

  • Residuals vs. Explanatory Variables: Used to verify linear relationships.

Residual plot versus fitted salesSaving residuals in JMP ProGraph builder for residual plotsResidual plot versus incomeResidual plot versus competitors

Normality Check

Normal quantile plots and histograms are used to check if residuals are approximately normally distributed.

Normal quantile plot and histogram for residuals

Inference in Multiple Regression

F-test for Model

The F-test evaluates the explanatory power of the model as a whole. The F-statistic is the ratio of the variance explained by the model to the variance of the residuals.

  • Null Hypothesis (): All slopes are equal to zero.

  • Interpretation: A low p-value indicates that the model explains significant variation in the response.

Analysis of variance table for F-test

Inference for Individual Coefficients

The t-test is used to test whether each slope is significantly different from zero.

  • Null Hypothesis (): for each coefficient.

  • Interpretation: Significant t-statistics and low p-values indicate that the explanatory variable has a significant effect.

t-test results for regression coefficients

Confidence Intervals

Confidence intervals provide a range of plausible values for each coefficient, consistent with t-test results.

Confidence intervals for regression coefficients

Prediction Intervals

A prediction interval estimates the range in which a new observation is likely to fall, given specific values of the explanatory variables. For example, a 95% prediction interval for sales at a location with median income of $70,000 and 3 competitors is $545.48 \pm $137.29 per square foot.

Steps in Fitting a Multiple Regression

Step-by-Step Process

  1. Define the Problem: Identify the response and explanatory variables relevant to the business question.

  2. Visualize Relationships: Use scatterplot matrices to check for linear relationships.

  3. Fit the Model: If relationships are linear, fit the multiple regression model; otherwise, consider transformations.

  4. Check Residuals: Obtain residuals and fitted values, and make scatterplots to check for equal variance and dependence.

  5. Check Normality: Use quantile plots and histograms to verify normality of residuals.

  6. Test Model Significance: Use the F-statistic to test if explanatory variables collectively affect the response.

  7. Interpret Partial Slopes: If the F-statistic is significant, interpret individual partial slopes and their confidence intervals.

Pearson Logo

Study Prep