BackChapter 9: Correlation and Regression – Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Correlation and Regression
Introduction to Correlation and Regression
Correlation and regression are fundamental statistical techniques used to examine and model the relationship between two or more variables. Correlation quantifies the strength and direction of a linear relationship, while regression allows for prediction and explanation of one variable based on another.
Correlation
Definition and Purpose
Correlation measures the degree to which two variables move together.
The correlation coefficient (denoted as r for samples, \rho for populations) quantifies the strength and direction of a linear relationship.
Values of r range from -1 to +1.
Types of Correlation
Positive Linear Correlation: As one variable increases, the other tends to increase.
Negative Linear Correlation: As one variable increases, the other tends to decrease.
No Correlation: No discernible linear relationship between variables.
Nonlinear Correlation: Variables are related, but not in a linear fashion.

Calculating the Correlation Coefficient
The Pearson Product-Moment Correlation Coefficient is the most common measure for interval or ratio data.
Sample formula for r (definitional):
Computational formula (for paired data):
n = number of paired observations
r estimates the population parameter \rho (rho).
Assumptions and Interpretation
Data should be measured at the interval or ratio level.
Variables should be approximately normally distributed.
The relationship should be linear.
If assumptions are violated, use Spearman’s \( \rho \) for ordinal data or non-normal distributions.
Interpreting the Value of r
r = 1: Perfect positive linear correlation
r = -1: Perfect negative linear correlation
r = 0: No linear correlation
The closer |r| is to 1, the stronger the linear relationship.
Coefficient of Determination (r2)
r2 is the proportion of variance in one variable explained by the other.
For example, if r = 0.79, then r2 = 0.6241, meaning 62.4% of the variance in y is explained by x.
Testing the Significance of Correlation
Null hypothesis: H0: \( \rho = 0 \) (no correlation)
Alternative hypothesis: Ha: \( \rho \neq 0 \) (significant correlation)
Test statistic:
Degrees of freedom: n – 2
Compare calculated t to critical t-value or use p-value from statistical software.
Correlation vs. Causation
Correlation does not imply causation.
To infer causation, additional criteria must be met (e.g., temporal precedence, ruling out confounders).
Spurious correlations can occur when two unrelated variables appear correlated due to a common cause.
Regression
Introduction to Regression
Regression analysis is used to predict the value of a dependent variable (y) based on the value of at least one independent variable (x). Simple linear regression involves one predictor; multiple regression involves two or more.
Simple Linear Regression Model
The equation for a regression line is:
y: Dependent variable (predicted value)
x: Independent variable (predictor)
a: Intercept (value of y when x = 0)
b: Slope (expected change in y for a one-unit increase in x)
The predicted value is denoted as \( \hat{y} \):
The error term (residual) is the difference between observed and predicted values.
Calculating Regression Coefficients
The slope (b) can be calculated as:
The intercept (a) is:
Where \( s_x \) and \( s_y \) are the standard deviations of x and y, respectively.
Standardized Regression Coefficient (Beta, \( \beta \))
\( \beta \) represents the expected change in y (in standard deviation units) for a one standard deviation change in x.
In simple regression, \( \beta = r \).
Interpreting Regression Output
Key statistics include R, R2, adjusted R2, standard error, coefficients, and significance values.



Example: Predicting Exam Grades from Hours Studied
Regression equation: \( \hat{y} = 68.08 + 6.237x \)
Interpretation: For each additional hour studied, the exam grade increases by approximately 6.24 points.
R = 0.790, R2 = 0.625 (62.5% of variance in exam grades explained by hours studied).
Significance Testing in Regression
Test whether the slope (b) is significantly different from zero (H0: b = 0).
Test whether the intercept (a) is significantly different from zero (less common in social sciences).
Test whether R2 is significantly different from zero.
Multiple Regression
Involves two or more predictors.
Allows for assessment of the unique contribution of each predictor to the outcome variable.
Example predictors for psychological outcomes: trauma exposure, organizational support, commitment, tenure, etc.
Example: Predicting Satisfaction with Supervisor
Dependent variable: Satisfaction with supervisor
Predictor: Opportunity to learn new things
R = 0.624, R2 = 0.389 (38.9% of variance explained)




Summary Table: Key Concepts in Correlation and Regression
Concept | Definition | Formula |
|---|---|---|
Correlation Coefficient (r) | Strength and direction of linear relationship | |
Coefficient of Determination (r2) | Proportion of variance in y explained by x | |
Regression Equation | Predicts y from x | |
Slope (b) | Change in y per unit change in x | |
Intercept (a) | Predicted y when x = 0 | |
t-test for r | Tests significance of correlation |
Additional info: In practice, statistical software (e.g., SPSS, Excel) is commonly used for calculations and significance testing. Always check assumptions before interpreting results, and remember that significant correlation does not establish causality.