BackLeast-Squares Regression and Coefficient of Determination: Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Least-Squares Regression and Coefficient of Determination
Least-Squares Regression Line
The least-squares regression line is a statistical method used to model the relationship between two quantitative variables. It minimizes the sum of the squared differences between observed values and the values predicted by the line.
Equation: The regression line is represented as .
Slope (): Indicates the change in the response variable for a one-unit increase in the explanatory variable.
Y-intercept (): Represents the predicted value of the response variable when the explanatory variable is zero.
Calculation:
, where is the correlation coefficient, is the standard deviation of the response variable, and is the standard deviation of the explanatory variable.
, where and are the means of the response and explanatory variables, respectively.

Example: Club-head Speed and Golf Ball Distance
Consider the relationship between club-head speed and the distance a golf ball travels. The data below represent eight swings, showing a linear relationship between speed and distance.
Data Table:
Club-head Speed (mph) x | Distance (yd) y | (x, y) |
|---|---|---|
100 | 257 | (100, 257) |
102 | 264 | (102, 264) |
103 | 274 | (103, 274) |
101 | 266 | (101, 266) |
105 | 277 | (105, 277) |
100 | 263 | (100, 263) |
99 | 258 | (99, 258) |
105 | 275 | (105, 275) |

Scatter Plot and Regression Line: Visualizing the data helps confirm the linear relationship.

Prediction Example: Using the regression equation, one can predict the mean distance for a club-head speed of 103 mph.
Interpretation of Slope and Y-Intercept
Understanding the meaning of the slope and y-intercept is crucial for interpreting regression results in context.
Slope: If club-head speed increases by 1 mph, the distance the golf ball travels increases by approximately 3.1661 yards, on average.
Y-intercept: The y-intercept is the predicted distance when club-head speed is 0 mph. However, if 0 is not a reasonable value for the explanatory variable (e.g., a club-head speed of 0 mph), the y-intercept should not be interpreted.
Extrapolation: Predictions should not be made for values outside the observed range (e.g., predicting distance for club-head speeds above 105 mph), as the linear relationship may not hold.

Case Study: Cola Consumption vs. Bone Mineral Density
Researchers investigated whether cola consumption is associated with lower bone mineral density in women. The data below show the number of colas consumed per week and bone mineral density for a sample of 15 women.
Data Table:
Number of Colas per Week | Bone Mineral Density (g/cm2) |
|---|---|
0 | 0.893 |
1 | 0.892 |
1 | 0.891 |
2 | 0.881 |
2 | 0.888 |
3 | 0.871 |
3 | 0.876 |
4 | 0.873 |
5 | 0.875 |
6 | 0.871 |
7 | 0.867 |
8 | 0.862 |
8 | 0.855 |

Correlation: The correlation coefficient is , indicating a strong negative linear relationship.
Regression Line: The least-squares regression line can be calculated using the formulas above, treating cola consumption as the explanatory variable.
Interpretation:
Slope: Represents the change in bone mineral density for each additional cola consumed per week.
Y-intercept: Represents the predicted bone mineral density when cola consumption is zero, provided this is a reasonable value.
Prediction Example: Predict the bone mineral density for a woman who consumes four colas per week and compare to observed values.
Extrapolation Warning: Avoid using the model to predict bone mineral density for values outside the observed range (e.g., two colas per day).
Coefficient of Determination ()
The coefficient of determination () quantifies the proportion of variation in the response variable explained by the regression model.
Definition: ranges from 0 to 1. Higher values indicate a better fit.
Interpretation:
: The regression line explains none of the variation.
: The regression line explains all the variation.
Intermediate values indicate partial explanatory power.
Calculation: , where is the correlation coefficient.
Examples:
Data Set A: (almost all variability explained)
Data Set B: (most variability explained)
Data Set C: (very little variability explained)

Application: For the club-head speed vs. distance data, and the cola consumption vs. bone mineral density case study, can be computed and interpreted to assess the strength of the linear relationship.
Summary Table: Regression Concepts
Concept | Definition | Formula |
|---|---|---|
Regression Line | Best-fit line for linear relationship | |
Slope () | Change in y per unit change in x | |
Y-intercept () | Predicted y when x = 0 | |
Coefficient of Determination () | Proportion of variance explained |
Additional info: These notes expand on the original content by providing definitions, formulas, examples, and warnings about extrapolation, making them suitable for exam preparation in a statistics course.