Multiple regression is a powerful statistical method used to model the relationship between one dependent variable and multiple independent variables. Unlike simple linear regression, which examines the relationship between two variables, multiple regression allows for the analysis of how several factors simultaneously influence an outcome. For example, the monthly rent of an apartment can be affected by its floor area, the age of the building, and the apartment number. In this context, the monthly rent is the dependent variable y, while floor area, age, and apartment number serve as independent variables, often denoted as x₁, x₂, and x₃ respectively.
To construct a multiple regression model, data for all variables must be organized systematically, typically with each independent variable in its own column. Using tools like Excel’s Data Analysis Toolpak simplifies the process by automating the calculation of regression coefficients. These coefficients quantify the impact of each independent variable on the dependent variable. For instance, if the coefficient for floor area (x₁) is 1.675, it means that for each additional unit increase in floor area, the monthly rent increases by approximately 1.675 units, holding other variables constant. Similarly, a negative coefficient for the age of the building (x₂) indicates that older buildings tend to have lower rents, while a positive coefficient for apartment number (x₃) suggests a slight increase in rent with higher apartment numbers. The regression equation takes the form:
\[ y = b_0 + b_1 x_1 + b_2 x_2 + b_3 x_3 \]
where b₀ is the y-intercept (baseline rent when all independent variables are zero), and b₁, b₂, and b₃ are the coefficients for each independent variable.
Evaluating the quality of a multiple regression model involves examining the coefficient of determination, denoted as R². This statistic measures the proportion of variance in the dependent variable explained by the independent variables collectively. An R² value of 0.797, for example, indicates that approximately 79.7% of the variation in monthly rent can be explained by the combined effects of floor area, building age, and apartment number. However, a limitation of R² is that it does not penalize the addition of irrelevant independent variables; adding more variables can artificially inflate R² even if those variables do not meaningfully contribute to the model.
To address this, the adjusted R² statistic is used. Adjusted R² modifies the R² value by accounting for the number of independent variables in the model, providing a more accurate measure of model quality. It typically has a value less than or equal to R². For example, an adjusted R² of 0.763 suggests that after considering the number of predictors, about 76.3% of the variation in rent is explained by the model, reflecting a more realistic assessment of its explanatory power.
In summary, multiple regression enables the prediction of a dependent variable based on several independent variables, with coefficients indicating the strength and direction of each relationship. The model’s effectiveness is evaluated using R² and adjusted R², ensuring that the model is both accurate and parsimonious. This approach is essential for analyzing complex real-world data where multiple factors influence outcomes simultaneously.