BackSimple Linear Regression: Concepts, Computation, and Interpretation
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 11: Simple Regression
Chapter Goals
Explain the simple linear regression model
Obtain and interpret the simple linear regression equation for a set of data
Describe R2 as a measure of explanatory power of the regression model
Understand the assumptions behind regression analysis
Explain measures of variation and determine whether the independent variable is significant
Calculate and interpret confidence intervals for the regression coefficients
Use a regression equation for prediction
Form forecast intervals around an estimated Y value for a given X
Use graphical analysis to recognize potential problems in regression analysis
Explain the correlation coefficient and perform a hypothesis test for zero population correlation
Section 11.1 Overview of Linear Models
Simple Linear Regression Model
The simple linear regression model describes the relationship between two variables using a straight line. The general form is:
Equation:
Y: Dependent variable (the outcome we wish to explain)
X: Independent variable (the predictor or explanatory variable)
\beta_0: Y-intercept (value of Y when X = 0)
\beta_1: Slope (change in Y for a one-unit change in X)
Least Squares Regression
Estimates for the coefficients are found using the least squares regression technique, which minimizes the sum of squared errors between observed and predicted values.
Sample regression equation:
Slope estimator:
Intercept estimator:
Introduction to Regression Analysis
Purpose:
Predict the value of a dependent variable based on the value of at least one independent variable
Explain the impact of changes in an independent variable on the dependent variable
Dependent variable: Also called the endogenous variable
Independent variable: Also called the exogenous variable
Section 11.2 Linear Regression Model
Population Regression Equation
The relationship between X and Y is described by a linear function:
and are population coefficients
is a random error term
Regression Model Components
Linear component:
Random error component:
Assumptions of Linear Regression
The true relationship is linear: Y is a linear function of X plus random error
Error terms are independent of X values
Error terms are random variables with mean 0 and constant variance (homoscedasticity):
for all
for all (no autocorrelation)
Section 11.3 Least Squares Coefficient Estimators
Finding the Best-Fit Line
Coefficients and are chosen to minimize the sum of squared errors (SSE):
Slope estimator:
Intercept estimator:
The regression line always passes through the means
Computation Using Software
Hand calculations are tedious; statistical software (e.g., Excel) is commonly used
Interpretation of Coefficients
Intercept (): Estimated average value of Y when X = 0 (if X = 0 is within the observed range)
Slope (): Estimated change in average Y for a one-unit increase in X
Simple Linear Regression Example
Application: House Price and Size
Dependent variable (Y): House price in $1000s
Independent variable (X): Square feet
Sample data (10 houses):
House Price in $1000s (Y) | Square Feet (X) |
|---|---|
245 | 1400 |
312 | 1600 |
279 | 1700 |
308 | 1875 |
199 | 1100 |
319 | 1550 |
405 | 2350 |
324 | 2450 |
319 | 1425 |
255 | 1700 |
Scatter plot and regression line can be generated using Excel
Regression Equation from Excel Output
Estimated regression equation:
Interpretation of : $98,248.33 is the portion of house price not explained by square feet (within observed range)
Interpretation of : For each additional square foot, house price increases by $0.10977 \times 1000 = $109.77 on average
Section 11.4 Explanatory Power of a Linear Regression Equation
Measures of Variation
Total Sum of Squares (SST):
Regression Sum of Squares (SSR):
Error Sum of Squares (SSE):
Relationship:
Analysis of Variance (ANOVA)
SST: Variation of Y values around their mean
SSR: Explained variation due to X
SSE: Unexplained variation (random error)
Coefficient of Determination ()
Proportion of total variation in Y explained by X
Ranges from 0 to 1
Interpretation: Higher indicates a better fit
Examples of Values
: Perfect linear relationship
: Weaker linear relationship
: No linear relationship
Excel Output Example
In the house price example,
Interpretation: 58.08% of the variation in house prices is explained by variation in square feet
Relationship Between Correlation and
For simple regression:
Estimation of Model Error Variance
Estimator for variance of model error:
is the standard error of the estimate
Division by (not ) because two parameters are estimated ( and )
Summary
Simple linear regression models the relationship between two variables using a straight line
Coefficients are estimated using least squares, minimizing the sum of squared errors
Interpretation of coefficients provides insight into the relationship between variables
Measures of variation and assess the explanatory power of the model
Statistical software is commonly used for computation