Chapter 15: Correlation – Statistical Relationships Between Variables

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Correlation: Statistical Relationships Between Variables

Meaning of Correlation

Correlation refers to the co-variation or co-relation between two variables, indicating how they change together. These variables are typically measured on a scale (interval or ratio) level. It is important to note that correlation does not imply causation; two variables may be correlated without one causing the other.

Definition: Correlation measures the degree to which two variables move in relation to each other.
Variables: Usually scale (interval or ratio) variables.
Key Point: Correlation is NOT causation!

Correlation Coefficient Characteristics

The correlation coefficient quantifies the strength and direction of the relationship between two variables.

Can be positive or negative.
Always falls between -1.00 and 1.00.
The magnitude (absolute value) indicates the strength of the relationship, not the sign.
The sign indicates the direction (positive or negative) of the relationship.

Types of Correlation

Positive Correlation: High scores on one variable tend to be matched with high scores on the other variable. This is a direct relationship.
Negative Correlation: High scores on one variable tend to be matched with low scores on the other variable. This is an inverse relationship.

Examples:

Perfect Positive Correlation: Every increase in one variable is matched by a proportional increase in the other.
Perfect Negative Correlation: Every increase in one variable is matched by a proportional decrease in the other.

Strength of Correlation

The strength of a correlation is determined by the absolute value of the correlation coefficient.

Size of the Correlation	Correlation Coefficient
Small	0.10
Medium	0.30
Large	0.50

Example: A correlation of -0.74 is stronger than a correlation of 0.25 because its magnitude is greater.

Misleading Correlations

Correlations can be misleading if interpreted as causation. For example, a high correlation between ice cream consumption and drowning deaths does not mean one causes the other; there may be a third variable (e.g., hot weather) influencing both.

Limitations of Correlation

Correlation is not causation.
There may be invisible third variables affecting both variables under study.

The Pearson Correlation Coefficient

Definition and Symbols

Quantifies a linear relationship between two scale variables.
Symbolized by r for sample data and by the Greek letter ρ (rho) for population parameters.

Formula for Pearson's r

The formula for the Pearson correlation coefficient is:

and are the variables, and are their means, and and are the sums of squares for and .
This formula standardizes the data by subtracting the mean and dividing by the variability.
Sample size is taken into account through the sums of squares.

Steps in Correlation Hypothesis Testing

Identify the populations, distribution, and assumptions.
State the null and research hypotheses.
Determine the characteristics of the comparison distribution.
Determine the critical values, or cutoffs.
Calculate the test statistic.
Make a decision.

Worked Example: Absences and Exam Grades

Consider the following data set:

Student	Absences	Exam Grade
1	4	82
2	2	98
3	2	76
4	3	68
5	1	84
6	7	67
7	4	90
8	8	58
9	7	50
10	3	78

To analyze the relationship, always start with a scatterplot to visualize the data.

Calculating the Correlation Coefficient (Step-by-Step)

First, calculate the mean for each variable, then the deviations, products, and sums of squares:

Absences (X)	(X - Mx)	Exam Grade (Y)	(Y - My)	(X - Mx)(Y - My)
4	0.6	82	6	3.6
2	-1.4	98	22	-30.8
2	-1.4	76	0	0
3	-0.4	68	-8	3.2
1	-2.4	84	8	-19.2
7	3.6	67	-9	-32.4
4	0.6	90	14	8.4
8	4.6	58	-18	-82.8
7	3.6	50	-26	-93.6
3	-0.4	78	2	-0.8
Means: Mx = 3.4, My = 76.0				Σ = -304.0

Next, calculate the sums of squares for each variable:

Absences (X)	(X - Mx)	(X - Mx)2	Exam Grade (Y)	(Y - My)	(Y - My)2
4	0.6	0.36	82	6	36
2	-1.4	1.96	98	22	484
2	-1.4	1.96	76	0	0
3	-0.4	0.16	68	-8	64
1	-2.4	5.76	84	8	64
7	3.6	12.96	67	-9	81
4	0.6	0.36	90	14	196
8	4.6	21.16	58	-18	324
7	3.6	12.96	50	-26	676
3	-0.4	0.16	78	2	4
Σ(X - Mx)2 = 56.4		Σ(Y - My)2 = 2262

Partial Correlation

Partial correlation quantifies the degree of association between two variables after removing the effect of a third variable. All three variables must be measured as scale variables. The partial correlation coefficient expresses the association between two variables, controlling for the influence of the third variable.

Applications in Psychometrics

Psychometrics: The field concerned with the development of tests and measures.
Correlation is used to assess two key aspects:
- Reliability: Consistency of a measure.
- Validity: Whether a measure assesses what it is intended to measure.

Reliability

Test-retest reliability: Consistency of results across multiple test administrations.
Split-half reliability: Correlation between odd- and even-numbered items.
Coefficient alpha (Cronbach's alpha, α): Average of all possible split-half correlations; a common estimate of reliability.

Validity

A valid measure accurately assesses the intended variable.
Correlation is used to compare a new measure with established measures to assess validity.

Example: Correlation can be used to establish the validity of a personality test, but establishing validity is generally more challenging than establishing reliability.

Additional info: These concepts are foundational in statistics and behavioral sciences, and understanding them is essential for interpreting research findings and designing valid experiments.