BackChapter 4: Describing the Relation between Two Variables – Scatter Diagrams and Correlation
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 4: Describing the Relation between Two Variables
4.1 Scatter Diagrams and Correlation
This section explores how to visually and numerically describe the relationship between two quantitative variables. Key concepts include scatter diagrams, correlation coefficients, and the distinction between correlation and causation.
4.1.1 Draw and Interpret Scatter Diagrams
Response Variable: The variable whose value can be explained by the value of the explanatory or predictor variable.
Scatter Diagram: A graph that displays the relationship between two quantitative variables measured on the same individual. Each point represents an individual, with the explanatory variable on the horizontal axis and the response variable on the vertical axis.
Example: The relationship between club-head speed (mph) and distance (yards) a golf ball travels is shown below:
Club-head Speed (mph) | Distance (yards) |
|---|---|
100 | 257 |
102 | 264 |
103 | 274 |
104 | 272 |
105 | 275 |
99 | 258 |
As club-head speed increases, the distance the ball travels also increases, indicating a positive association.
Scatter diagrams help distinguish between linear, nonlinear, and no relation between variables.
Linear relations can be positive (upward slant) or negative (downward slant).
4.1.1 Types of Association
Positively Associated: Above-average values of one variable are associated with above-average values of the other. As one increases, so does the other.
Negatively Associated: Above-average values of one variable are associated with below-average values of the other. As one increases, the other decreases.
4.1.2 Properties of the Linear Correlation Coefficient
The linear correlation coefficient (Pearson product moment correlation coefficient) measures the strength and direction of the linear relationship between two quantitative variables.
Population correlation coefficient: (rho)
Sample correlation coefficient:
Formula for sample correlation coefficient:
where and are sample means, and are sample standard deviations, and is sample size.
Alternative computational formula:
Properties of the Linear Correlation Coefficient
The value of is always between -1 and 1, inclusive: .
If , there is a perfect positive linear relation.
If , there is a perfect negative linear relation.
The closer is to +1, the stronger the positive association.
The closer is to -1, the stronger the negative association.
If is close to 0, there is little or no evidence of a linear relation (but possibly a nonlinear relation).
is unitless; the units of and do not affect its value.
is not resistant; outliers can significantly affect its value.
Visual Examples: Scatter diagrams can show perfect, strong, moderate, or weak positive/negative linear relations, or no linear relation.
4.1.3 Compute and Interpret the Linear Correlation Coefficient
To compute by hand:
Calculate sample means and standard deviations for and .
Standardize each value: and .
Multiply standardized values for each pair and sum.
Divide by to obtain .
Example: For the data set:
x | y |
|---|---|
1 | 18 |
3 | 13 |
6 | 9 |
7 | 4 |
Calculated , indicating a strong negative association.
Technology (e.g., Excel, calculators) can also compute efficiently.
4.1.4 Determine Whether a Linear Relation Exists Between Two Variables
To test for a linear relation:
Determine the absolute value of .
Find the critical value for the given sample size (from a table).
If is greater than the critical value, a linear relation exists; otherwise, it does not.
Interpretation: If is positive and greater than the critical value, the association is positive. If is negative and less than the negative critical value, the association is negative.
4.1.5 Explain the Difference Between Correlation and Causation
Correlation does not imply causation. Observational data cannot establish a causal relationship.
A lurking variable is related to both the explanatory and response variable and may confound the observed association.
Example: As air-conditioning bills increase, so does the crime rate. The lurking variable is air temperature, which affects both variables.
Bone Mineral Density Study: Researchers found a negative correlation between cola consumption and bone mineral density in women, but could not conclude causation due to possible lurking variables (e.g., age, BMI, smoking, calcium intake).
Number of Colas per Week | Bone Mineral Density (g/cm2) |
|---|---|
0 | 0.892 |
1 | 0.882 |
2 | 0.881 |
3 | 0.884 |
4 | 0.876 |
5 | 0.875 |
6 | 0.867 |
7 | 0.862 |
Researchers must be careful to state association, not causation, when lurking variables may be present.