Skip to main content
Back

Multiple Regression and Model Building: Study Notes for Business Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Multiple Regression Analysis

Introduction to Multiple Regression

Multiple regression is an extension of simple linear regression that allows for the inclusion of two or more independent variables (regressors) to predict a dependent variable. This technique is widely used in business statistics to model complex relationships and control for confounding variables.

  • Estimated Coefficient Calculation: In multiple regression, the estimated coefficient for each regressor measures the expected change in the dependent variable for a one-unit increase in that regressor, holding all other regressors constant.

  • Interpretation: Each coefficient represents the unique contribution of its variable to the prediction of the dependent variable, after accounting for the effects of other variables in the model.

  • Example: In a model predicting salary based on years of experience and education level, the coefficient for years of experience shows the effect of experience on salary, controlling for education.

Sum of Squares and Model Fit

Adding regressors to a model affects the partitioning of variance in the dependent variable.

  • SSE (Sum of Squared Errors): Measures the unexplained variation; typically decreases as more regressors are added.

  • SSR (Sum of Squares due to Regression): Measures the explained variation; typically increases with more regressors.

  • SST (Total Sum of Squares): Total variation in the dependent variable; remains constant for a given dataset.

  • Relationship:

  • R2 (Coefficient of Determination): Proportion of variance explained by the model. always increases (or stays the same) as more regressors are added, even if they are not meaningful.

  • Adjusted R2: Adjusts for the number of predictors, penalizing unnecessary variables. It can decrease if a new regressor does not improve the model sufficiently.

  • Formula for Adjusted R2:

  • Interpretation: Adjusted provides a more accurate measure of model fit when comparing models with different numbers of predictors.

Significance Testing in Regression

Statistical tests are used to determine whether individual regressors or the overall model are significant predictors of the dependent variable.

  • Individual Significance Test (t-test): Tests whether a single coefficient is significantly different from zero.

  • Test Statistic:

  • Full F-test: Tests whether at least one regressor is significant (i.e., the model as a whole is useful).

  • F Statistic Formula: (Formula provided on exam; not required to memorize.)

  • Steps:

    1. State hypotheses (null: coefficient = 0).

    2. Calculate test statistic (t or F).

    3. Compare to critical value or use p-value.

    4. Draw conclusion about significance.

Model Building: Special Terms and Issues

Quadratic Terms

A quadratic term allows the model to capture curvature (nonlinear relationships) between a regressor and the dependent variable.

  • Definition: A quadratic term is the square of a regressor (e.g., ).

  • When to Add: Add when the relationship between a predictor and the outcome is not linear (e.g., diminishing returns).

  • Interpretation: The coefficient of the quadratic term indicates the direction and strength of curvature.

  • Example: Predicting sales based on advertising spending and (advertising spending)2 to capture diminishing returns.

Interaction Terms

Interaction terms allow the effect of one regressor to depend on the value of another regressor.

  • Definition: An interaction term is the product of two regressors (e.g., ).

  • When to Add: Add when the effect of one variable is believed to change depending on another variable.

  • Interpretation: The coefficient of the interaction term shows how the relationship between one predictor and the outcome changes as the other predictor changes.

  • Example: Modeling the effect of education and experience on salary, including an interaction term to see if the effect of education depends on experience.

Dummy Variables

Dummy variables are used to include qualitative (categorical) data in regression models.

  • Definition: A dummy variable takes the value 0 or 1 to indicate the absence or presence of a categorical effect.

  • When to Add: Add when including qualitative data such as gender, school of major, or region.

  • Interpretation: The coefficient of a dummy variable represents the difference in the dependent variable between the reference group and the group indicated by the dummy.

  • Population Models: For each category, a different regression equation can be written by substituting the appropriate dummy variable values.

  • Example: Gender coded as 1 for female, 0 for male; the coefficient shows the difference in outcome between females and males.

Multicollinearity (Collinearity)

Multicollinearity occurs when two or more regressors in a model are highly correlated, making it difficult to isolate their individual effects.

  • Definition: High correlation among independent variables.

  • Causes: Including variables that measure similar concepts or are mathematically related.

  • Consequences: Inflated standard errors, unreliable coefficient estimates, and difficulty in determining the effect of each variable.

  • Detection: High correlation coefficients, variance inflation factors (VIF), or unstable regression coefficients.

Interpreting Regression Output

Using Excel Regression Results

Business statistics often involves interpreting regression output from software such as Excel.

  • Estimated Regression Equation: Write the equation using the estimated coefficients from the output.

  • Prediction: Substitute values of the regressors into the equation to predict the dependent variable.

  • Testing Individual Significance: Use t-tests, p-values, or confidence intervals for each coefficient.

  • Testing Overall Model: Use the F-test to assess whether the model explains a significant portion of the variance.

  • Calculating R2 and Adjusted R2: Use the output to report and interpret these measures of model fit.

Summary Table: Key Regression Concepts

Concept

Definition

Formula (if applicable)

SST

Total Sum of Squares

SSR

Sum of Squares due to Regression

--

SSE

Sum of Squared Errors

--

R2

Proportion of variance explained

Adjusted R2

R2 adjusted for number of predictors

t-test

Test for individual coefficient

F-test

Test for overall model significance

Formula provided on exam

Additional info:

  • Students are expected to be familiar with basic regression concepts such as the population model, least squares regression line, error term, population parameters, alpha, and p-value.

  • Practice with Excel output and worksheet problems is recommended for exam preparation.

Pearson Logo

Study Prep