12. Regression

Coefficient of Determination

12. Regression

Coefficient of Determination: Videos & Practice Problems

Topic summary

The coefficient of determination, represented as $R^{2}$ , measures how much variation in the dependent variable (y) is explained by the independent variable (x). It is calculated by squaring the linear correlation coefficient $r$ . A higher $R^{2}$ value indicates a stronger linear relationship, with values closer to 1 suggesting that most variation is explained by the correlation, while values near 0 indicate minimal explanation. This concept is crucial in regression analysis and predictive modeling.

concept

Coefficient of Determination

Video duration:

Coefficient of Determination Video Summary

In statistical analysis, understanding the relationship between two variables is crucial, and two important measures for this are the linear correlation coefficient, denoted as $ r $, and the coefficient of determination, represented as $ R^2 $. The linear correlation coefficient $ r $ quantifies the strength and direction of a linear relationship between the x and y values, ranging from -1 to 1. A value close to 1 indicates a strong positive correlation, while a value close to -1 indicates a strong negative correlation. A value of 0 suggests no correlation.

The coefficient of determination $ R^2 $ provides insight into how well the variation in the y variable can be explained by the variation in the x variable. It is calculated as the square of the linear correlation coefficient: $ R^2 = r^2 $. This means that if you know the value of $ r $, you can easily find $ R^2 $ by squaring it. For instance, if $ r = 0.745 $, then $ R^2 = (0.745)^2 = 0.555 $. Unlike $ r $, which can be negative, $ R^2 $ is always a non-negative value between 0 and 1. A higher $ R^2 $ value indicates that a greater proportion of the variance in the dependent variable is predictable from the independent variable.

Graphically, $ R^2 $ can be interpreted as the ratio of explained variation to total variation. The explained variation refers to how much of the data's variance can be accounted for by the regression line, while total variation is the variance of the data points from their mean. If the data points are closely clustered around the regression line, $ R^2 $ approaches 1, indicating a strong linear relationship. Conversely, if the points are widely scattered, $ R^2 $ approaches 0, suggesting that the linear model does not explain the data well.

In practical applications, $ R^2 $ is often expressed as a percentage. For example, if $ R^2 = 0.555 $, one would say that 55.5% of the variation in the dependent variable is explained by the independent variable, while the remaining 44.5% is attributed to other factors or randomness. This understanding is essential for interpreting the effectiveness of a linear regression model and recognizing the limitations of correlation in explaining variability in data.

Study Smarter with Worksheets.

Follow along with each video using our printable worksheets

Problem

In a given dataset, you determine the value of the correlation coefficient to be $r=-0.957$ . Find the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? What about the unexplained variation?

Explained = 91.6%; Unexplained = 8.4%

Explained = 95.7%; Unexplained = 4.3%

Explained = 8.4%; Unexplained = 91.6%

Explained = 4.3%; Unexplained = 95.7%

Problem

A retail analyst is studying the relationship between the number of in-store promotional displays (x) and weekly sales revenue (y) at 12 store locations. Use the data below and a calculator to find the coefficient of determination.

0.0031

0.0016

0.9984

0.9969

Do you want more practice?

We have more practice problems on Coefficient of Determination

Here’s what students ask on this topic:

The coefficient of determination, denoted as R², measures the proportion of variation in the dependent variable (y) that is explained by the independent variable (x) in a linear regression model. It is calculated as the square of the correlation coefficient (r), which represents the strength and direction of the linear relationship between x and y. Mathematically, $R^{2} = r^{2}$ . While r ranges from -1 to 1, R² is always between 0 and 1. A higher R² value indicates a stronger linear relationship, meaning more of the variation in y is explained by x. For example, if R² = 0.75, it means 75% of the variation in y is explained by x, while the remaining 25% is due to other factors or randomness.

In regression analysis, R² represents the proportion of the total variation in the dependent variable (y) that is explained by the independent variable (x). It is expressed as a percentage. For example, if R² = 0.80, it means 80% of the variation in y is explained by the linear relationship with x, while the remaining 20% is due to other factors or randomness. A higher R² value (closer to 1) indicates a stronger linear relationship, while a lower R² value (closer to 0) suggests a weaker relationship. However, R² alone does not indicate causation or the quality of the model, so it should be interpreted alongside other metrics and the context of the data.

To calculate R² using a graphing calculator, follow these steps:
1. Enter your data into the calculator by accessing the STAT menu and selecting EDIT. Input the x-values into L1 and the y-values into L2.
2. Open the STAT menu again, navigate to CALC, and select LinReg(ax+b).
3. Assign L1 as the x-variable and L2 as the y-variable.
4. Press Calculate. The calculator will display the linear regression output, including the correlation coefficient (r) and the coefficient of determination (R²).
For example, if r = 0.745, then R² = 0.555, as R² is simply the square of r. This process ensures you can quickly find R² without manual calculations.

If R² is close to 1, it indicates that most of the variation in the dependent variable (y) is explained by the independent variable (x), suggesting a strong linear relationship. For example, an R² of 0.95 means 95% of the variation in y is explained by x. Conversely, if R² is close to 0, it means that very little of the variation in y is explained by x, indicating a weak or no linear relationship. For instance, an R² of 0.05 suggests that only 5% of the variation in y is due to x, with the remaining 95% attributed to other factors or randomness.

In the context of R², explained variation refers to the portion of the total variation in the dependent variable (y) that is accounted for by the independent variable (x) through the regression model. It is the sum of the squared differences between the predicted values (from the regression line) and the mean of y. Total variation, on the other hand, is the sum of the squared differences between the actual y-values and the mean of y. R² is calculated as the ratio of explained variation to total variation: $R^{2} = \frac{Explained Variation}{Total Variation}$ . A higher ratio indicates a stronger linear relationship.

Your Statistics for Business tutors

Patrick Ford

Physics and Math Lead Instructor

Colleen Daly

Math Instructor

Coefficient of Determination: Videos & Practice Problems

Coefficient of Determination

Coefficient of Determination Video Summary

In a given dataset, you determine the value of the correlation coefficient to be $r=-0.957$ . Find the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? What about the unexplained variation?

A retail analyst is studying the relationship between the number of in-store promotional displays (x) and weekly sales revenue (y) at 12 store locations. Use the data below and a calculator to find the coefficient of determination.

Do you want more practice?

Here’s what students ask on this topic:

What is the coefficient of determination (R²) and how is it related to the correlation coefficient (r)?

How do you interpret the value of R² in regression analysis?

How do you calculate R² using a graphing calculator?

What does it mean if R² is close to 0 or 1 in a dataset?

What is the difference between explained variation and total variation in the context of R²?

Your Statistics for Business tutors

Coefficient of Determination: Videos & Practice Problems

Coefficient of Determination

Coefficient of Determination Video Summary

In a given dataset, you determine the value of the correlation coefficient to be r=−0.957r=-0.957r=−0.957. Find the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? What about the unexplained variation?

A retail analyst is studying the relationship between the number of in-store promotional displays (x) and weekly sales revenue (y) at 12 store locations. Use the data below and a calculator to find the coefficient of determination.

Do you want more practice?

Here’s what students ask on this topic:

What is the coefficient of determination (R²) and how is it related to the correlation coefficient (r)?

What is the coefficient of determination (R²) and how is it related to the correlation coefficient (r)?

How do you interpret the value of R² in regression analysis?

How do you interpret the value of R² in regression analysis?

How do you calculate R² using a graphing calculator?

How do you calculate R² using a graphing calculator?

What does it mean if R² is close to 0 or 1 in a dataset?

What does it mean if R² is close to 0 or 1 in a dataset?

What is the difference between explained variation and total variation in the context of R²?

What is the difference between explained variation and total variation in the context of R²?

Your Statistics for Business tutors

In a given dataset, you determine the value of the correlation coefficient to be $r=-0.957$ . Find the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? What about the unexplained variation?