Skip to main content
Back

Chapter 9: Correlation and Regression – Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Correlation and Regression

Introduction to Correlation and Regression

Correlation and regression are fundamental statistical techniques used to examine and model the relationship between two or more variables. Correlation quantifies the strength and direction of a linear relationship, while regression allows for prediction and explanation of one variable based on another.

Correlation

Definition and Purpose

  • Correlation measures the degree to which two variables move together.

  • The correlation coefficient (denoted as r for samples, \rho for populations) quantifies the strength and direction of a linear relationship.

  • Values of r range from -1 to +1.

Types of Correlation

  • Positive Linear Correlation: As one variable increases, the other tends to increase.

  • Negative Linear Correlation: As one variable increases, the other tends to decrease.

  • No Correlation: No discernible linear relationship between variables.

  • Nonlinear Correlation: Variables are related, but not in a linear fashion.

Scatterplots showing types of correlation

Calculating the Correlation Coefficient

  • The Pearson Product-Moment Correlation Coefficient is the most common measure for interval or ratio data.

  • Sample formula for r (definitional):

  • Computational formula (for paired data):

  • n = number of paired observations

  • r estimates the population parameter \rho (rho).

Assumptions and Interpretation

  • Data should be measured at the interval or ratio level.

  • Variables should be approximately normally distributed.

  • The relationship should be linear.

  • If assumptions are violated, use Spearman’s \( \rho \) for ordinal data or non-normal distributions.

Interpreting the Value of r

  • r = 1: Perfect positive linear correlation

  • r = -1: Perfect negative linear correlation

  • r = 0: No linear correlation

  • The closer |r| is to 1, the stronger the linear relationship.

Coefficient of Determination (r2)

  • r2 is the proportion of variance in one variable explained by the other.

  • For example, if r = 0.79, then r2 = 0.6241, meaning 62.4% of the variance in y is explained by x.

Testing the Significance of Correlation

  • Null hypothesis: H0: \( \rho = 0 \) (no correlation)

  • Alternative hypothesis: Ha: \( \rho \neq 0 \) (significant correlation)

  • Test statistic:

  • Degrees of freedom: n – 2

  • Compare calculated t to critical t-value or use p-value from statistical software.

Correlation vs. Causation

  • Correlation does not imply causation.

  • To infer causation, additional criteria must be met (e.g., temporal precedence, ruling out confounders).

  • Spurious correlations can occur when two unrelated variables appear correlated due to a common cause.

Regression

Introduction to Regression

Regression analysis is used to predict the value of a dependent variable (y) based on the value of at least one independent variable (x). Simple linear regression involves one predictor; multiple regression involves two or more.

Simple Linear Regression Model

  • The equation for a regression line is:

  • y: Dependent variable (predicted value)

  • x: Independent variable (predictor)

  • a: Intercept (value of y when x = 0)

  • b: Slope (expected change in y for a one-unit increase in x)

  • The predicted value is denoted as \( \hat{y} \):

  • The error term (residual) is the difference between observed and predicted values.

Calculating Regression Coefficients

  • The slope (b) can be calculated as:

  • The intercept (a) is:

  • Where \( s_x \) and \( s_y \) are the standard deviations of x and y, respectively.

Standardized Regression Coefficient (Beta, \( \beta \))

  • \( \beta \) represents the expected change in y (in standard deviation units) for a one standard deviation change in x.

  • In simple regression, \( \beta = r \).

Interpreting Regression Output

  • Key statistics include R, R2, adjusted R2, standard error, coefficients, and significance values.

SPSS Model Summary for regression of Exam Grade on Hours StudiedSPSS ANOVA table for regression of Exam Grade on Hours StudiedSPSS Coefficient table for regression of Exam Grade on Hours Studied

Example: Predicting Exam Grades from Hours Studied

  • Regression equation: \( \hat{y} = 68.08 + 6.237x \)

  • Interpretation: For each additional hour studied, the exam grade increases by approximately 6.24 points.

  • R = 0.790, R2 = 0.625 (62.5% of variance in exam grades explained by hours studied).

Significance Testing in Regression

  • Test whether the slope (b) is significantly different from zero (H0: b = 0).

  • Test whether the intercept (a) is significantly different from zero (less common in social sciences).

  • Test whether R2 is significantly different from zero.

Multiple Regression

  • Involves two or more predictors.

  • Allows for assessment of the unique contribution of each predictor to the outcome variable.

  • Example predictors for psychological outcomes: trauma exposure, organizational support, commitment, tenure, etc.

Example: Predicting Satisfaction with Supervisor

  • Dependent variable: Satisfaction with supervisor

  • Predictor: Opportunity to learn new things

  • R = 0.624, R2 = 0.389 (38.9% of variance explained)

SPSS Model Summary for regression of Satisfaction with Supervisor on Opportunity to LearnSPSS ANOVA table for regression of Satisfaction with Supervisor on Opportunity to LearnSPSS Coefficient table for regression of Satisfaction with Supervisor on Opportunity to LearnScatterplot of regression standardized residuals vs. predicted values

Summary Table: Key Concepts in Correlation and Regression

Concept

Definition

Formula

Correlation Coefficient (r)

Strength and direction of linear relationship

Coefficient of Determination (r2)

Proportion of variance in y explained by x

Regression Equation

Predicts y from x

Slope (b)

Change in y per unit change in x

Intercept (a)

Predicted y when x = 0

t-test for r

Tests significance of correlation

Additional info: In practice, statistical software (e.g., SPSS, Excel) is commonly used for calculations and significance testing. Always check assumptions before interpreting results, and remember that significant correlation does not establish causality.

Pearson Logo

Study Prep