Skip to main content
Back

Correlation and Simple Linear Regression: Concepts, Calculations, and Applications

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Correlation and Simple Linear Regression

14.1 Dependent and Independent Variables

In statistical modeling, variables are classified as either independent (explanatory) or dependent (response). The independent variable, denoted as x, is used to explain or predict changes in the dependent variable, y. The direction of explanation is one-way: changes in x are used to explain changes in y, not vice versa.

  • Independent Variable (x): The variable that is manipulated or categorized to observe its effect on the dependent variable.

  • Dependent Variable (y): The outcome or variable being predicted or explained.

Examples:

Independent Variable

Dependent Variable

The size of a television screen

The selling price of the television

The number of visitors per day on a website

The amount of sales per day from the website

The curb weight of a car

The car's gas mileage

Examples of independent and dependent variables

14.2 Correlation Analysis

Correlation analysis measures the strength and direction of a linear relationship between two variables. A relationship is considered linear if a scatter plot of the variables forms a straight-line pattern.

  • Strength: How closely the data points fit a straight line.

  • Direction: Positive (both variables increase together) or negative (one increases as the other decreases).

Example: Investigating the relationship between hours studied and exam grades for six students.

Student

Hours of Study

Exam Grade

1

3

86

2

6

95

3

4

92

4

4

83

5

3

78

6

2

79

Table of hours studied and exam grades

The scatter plot visually displays the relationship:

Scatter plot of hours studied vs. exam grades

Summary Table for Calculations

To compute correlation and regression, a summary table of calculations is constructed:

Summary table for correlation and regression calculations

Sample Correlation Coefficient (r)

The correlation coefficient (r) quantifies the strength and direction of the linear relationship. Its value ranges from -1 (perfect negative) to +1 (perfect positive). r = 0 indicates no linear relationship.

Examples of correlation coefficients

The formula for the sample correlation coefficient is:

Formula for the sample correlation coefficient

Substituting the values from the example:

Calculation of the sample correlation coefficient

Using Excel: The correlation coefficient can be calculated using =CORREL(array1, array2).

Excel calculation of correlation coefficient

Hypothesis Test for Correlation Coefficient

To determine if the population correlation coefficient (ρ) is significantly different from zero, a hypothesis test is conducted:

  • Null hypothesis:

  • Alternative hypothesis:

The test statistic is:

Formula for t-test statistic for correlation coefficient

Example calculation:

Calculation of t-test statistic for correlation coefficient

Critical values are compared to determine significance. If the calculated t is greater than the critical value, the null hypothesis is rejected.

Critical region for t-test of correlation coefficient

14.3 Simple Linear Regression Analysis

Simple linear regression describes the best-fitting straight line through a set of (x, y) pairs, using only one independent variable. The regression equation is:

  • \( \hat{y} \): Predicted value of y

  • \( b_0 \): y-intercept

  • \( b_1 \): Slope

Graphical illustration of regression line

The difference between the actual and predicted values is called the residual:

Graphical illustration of residuals

The Least Squares Method

The least squares method finds the regression line that minimizes the sum of squared errors (SSE):

Formula for sum of squared errors (SSE)

Calculating Slope and Intercept

The formulas for the regression slope and intercept are:

Formula for regression slope

Formula for regression intercept

Example calculation for the exam data:

Calculation of regression slopeCalculation of regression intercept

Interpretation: Each additional hour of study increases the predicted exam score by 4.18 points.

Using Excel for Regression

Excel's Data Analysis tool can be used to perform regression analysis, providing coefficients, R-squared, and other statistics.

Excel Data Analysis for regressionExcel Regression dialog boxExcel regression output

Partitioning the Sum of Squares

The total sum of squares (SST) measures the total variation in the dependent variable. It is partitioned into:

  • SSR (Regression Sum of Squares): Variation explained by the regression

  • SSE (Error Sum of Squares): Unexplained variation

Formula for total sum of squares (SST)

Coefficient of Determination (R2)

The coefficient of determination (R2) measures the proportion of total variation in the dependent variable explained by the independent variable:

Formula for coefficient of determination

Example calculation:

Calculation of R squared

Interpretation: 68.6% of the variation in exam grades is explained by hours studied.

Hypothesis Test for R2

To test if the population coefficient of determination is significantly greater than zero, use the F-test:

Formula for F-test statistic

Example calculation:

Calculation of F-test statistic

Compare the calculated F to the critical value. If F is greater, reject the null hypothesis.

Critical region for F-test

The p-value can also be used for decision-making.

p-value for F-test

14.4 Using Regression to Make Predictions

To predict the value of y for a given x, substitute x into the regression equation. A confidence interval can be constructed around the point estimate using the standard error of the estimate (se):

Formula for standard error of the estimateComparison of small and large standard errors

Example calculation:

Calculation of standard error of the estimate

Confidence Interval for Average Value of y

The confidence interval for the average value of y at a given x is:

Formula for confidence interval for average y

Example calculation:

Calculation of confidence interval for average y

Upper and lower confidence limits:

Upper and lower confidence limitsGraphical confidence interval for regression prediction

Prediction Interval for a Specific Value of y

The prediction interval for an individual value of y at a given x is wider than the confidence interval for the mean:

Formula for prediction interval for specific y

Example calculation:

Calculation of prediction interval for specific yUpper and lower prediction limits

14.5 Testing the Significance of the Slope

To determine if the independent variable significantly predicts the dependent variable, test the null hypothesis that the population slope (β1) is zero:

  • Null hypothesis:

  • Alternative hypothesis:

The test statistic is:

Formula for t-test statistic for regression slope

The standard error of the slope is:

Formula for standard error of the slope

Confidence Interval for the Slope

The confidence interval for the population slope is:

Formula for confidence interval for regression slope

Interpretation: If the interval does not include zero, there is evidence of a linear relationship.

14.6 Assumptions for Regression Analysis

For regression results to be valid, several assumptions must be met:

  • Linearity: The relationship between x and y is linear.

  • Independence: Residuals are independent.

  • Homoscedasticity: Constant variance of residuals across values of x.

  • Normality: Residuals are normally distributed.

Scatter plots and residual plots are used to check these assumptions.

14.7 Simple Regression with Negative Correlation

When the slope of the regression line is negative, the variables have a negative linear relationship. For example, as the price of a product increases, demand typically decreases.

14.8 Final Thoughts

  • Do not extrapolate: Avoid predicting values outside the observed range of x.

  • Correlation is not causation: A significant relationship does not imply that x causes changes in y.

Pearson Logo

Study Prep