Multiple regression is a powerful statistical method used to model the relationship between one dependent variable and multiple independent variables. Unlike simple linear regression, which examines the relationship between two variables, multiple regression allows for the analysis of how several factors simultaneously influence an outcome. For example, monthly rent can be affected not only by the floor area of an apartment but also by the age of the building and the apartment number. In this context, the dependent variable (monthly rent) is denoted as y, while the independent variables (floor area, age, apartment number) are represented as x₁, x₂, and x₃ respectively.
The multiple regression model can be expressed mathematically as:
\[y = b_0 + b_1 x_1 + b_2 x_2 + b_3 x_3 + \varepsilon\]where b₀ is the y-intercept, b₁, b₂, and b₃ are the coefficients corresponding to each independent variable, and ε represents the error term. These coefficients quantify the expected change in the dependent variable for a one-unit change in the respective independent variable, holding other variables constant.
Using tools like Excel’s Data Analysis Toolpak simplifies the process of calculating these coefficients and generating the regression equation. By inputting the dependent variable data and all independent variables simultaneously, the software outputs the coefficients and key statistics, including the intercept. For instance, a model might yield coefficients such as 1.675 for floor area (x₁), -7.854 for age of the building (x₂), and 0.122 for apartment number (x₃), with an intercept of 424.79. This equation can then be used to predict monthly rent by substituting known values of the independent variables.
Evaluating the quality of a multiple regression model involves examining the coefficient of determination, denoted as R². This statistic measures the proportion of variance in the dependent variable explained by the independent variables collectively. An R² value of 0.797 indicates that approximately 79.7% of the variation in monthly rent is accounted for by the combined effects of floor area, building age, and apartment number.
However, a limitation of R² in multiple regression is that it does not penalize the addition of irrelevant independent variables; adding more variables can artificially inflate R² even if those variables do not meaningfully contribute to the model. To address this, the adjusted R² statistic is used, which adjusts for the number of predictors in the model, providing a more accurate measure of model quality. The adjusted R² will always be less than or equal to the regular R² and decreases if unnecessary variables are included. For example, an adjusted R² of 0.763 suggests a slightly more conservative estimate of explained variance, accounting for model complexity.
Understanding and applying multiple regression equips students with the ability to analyze complex real-world data where multiple factors influence an outcome. By interpreting coefficients, calculating predictions, and evaluating model fit through R² and adjusted R², learners develop critical skills in statistical modeling and data-driven decision-making.