BackVariation and the Coefficient of Determination in Regression Analysis
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Variation and the Coefficient of Determination
Understanding the Coefficient of Determination ()
The coefficient of determination, denoted as , is a key statistical measure in regression analysis. It quantifies how much of the variation in the dependent variable (y) is explained by the variation in the independent variable (x).
Definition: measures the proportion of the total variation in y that is explained by the regression model using x.
Interpretation: An value close to 1 indicates that almost all of the variation in y is explained by x. An value close to 0 means that none of the variation is explained by x; the data is nearly uncorrelated.
Formula:
Alternatively, can be calculated as the square of the correlation coefficient ():
Explained vs. Unexplained Variation
In regression analysis, the total variation in the dependent variable (y) can be split into two components:
Explained Variation: The part of the variation in y that is accounted for by the regression model (i.e., by changes in x).
Unexplained Variation: The part of the variation in y that is not accounted for by the regression model; often due to random error or other variables not included in the model.
Example: Suppose you have data on test scores (y) versus hours studied (x). If , then 55.5% of the variation in test scores is explained by hours studied, and 44.5% is unexplained.
Application: Calculating from Data
Given a dataset, you can determine the value of the correlation coefficient () and then compute to assess the strength of the relationship between variables.
Step 1: Enter the data into lists (e.g., L1 and L2) on a calculator.
Step 2: Use the regression function to calculate the correlation coefficient ().
Step 3: Square the correlation coefficient to obtain .
Calculator Instructions (TI-84):
Enter data in L1 and L2.
Press STAT → CALC → LinReg(ax+b).
Set Xlist: L1, Ylist: L2.
View output: = Correlation Coefficient, = Coefficient of Determination.
Worked Example Table
The following table illustrates how to compute the coefficient of determination from a set of data:
Hours Studied (x) | Test Score (y) |
|---|---|
2 | 65 |
4 | 70 |
6 | 75 |
8 | 80 |
10 | 85 |
Suppose the correlation coefficient . Then:
This means 55.5% of the variation in test scores is explained by hours studied.
Additional Example: Retail Analysis
A retail analyst studies the relationship between the number of in-store promotional displays (x) and weekly sales revenue (y) at 12 store locations. The data is entered into a calculator to find the coefficient of determination.
Displays (x) | Weekly Revenue (y) |
|---|---|
5 | 1440 |
6 | 1560 |
7 | 1680 |
8 | 1800 |
9 | 1920 |
10 | 2040 |
11 | 2160 |
12 | 2280 |
By following the calculator steps above, the analyst can determine and interpret how much of the variation in weekly revenue is explained by the number of displays.
Summary Table: Interpretation
Value | Interpretation |
|---|---|
Close to 1 | Nearly all variation in y is explained by x |
Close to 0 | Almost none of the variation in y is explained by x |
Additional info: The coefficient of determination is a central concept in regression analysis, helping to assess the goodness-of-fit of a model and the strength of the relationship between variables.