BackCorrelation and Scatterplots in Statistics: Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Explanatory and Response Variables
Definitions and Roles
In statistical studies, understanding the roles of variables is essential for analyzing relationships. Two key types of variables are commonly discussed:
Explanatory Variable: In an experimental study, the explanatory variable (also called the predictor variable) is manipulated by the researcher to observe its effect.
Response Variable: The response variable is the outcome whose value is explained by the explanatory or predictor variable.
Scatter Plot (Scatter Diagram)
Purpose and Construction
A scatter plot is a graphical tool used to display the relationship between two quantitative variables measured on the same individuals.
Each individual in the data set is represented by a point in the scatter diagram.
The explanatory variable is plotted on the horizontal axis (x-axis).
The response variable is plotted on the vertical axis (y-axis).
Each point on the scatter plot represents two pieces of data (e.g., diameter and height).
Example: A scatter plot showing the relationship between plant diameter and plant height.
Relationships Shown in Scatter Plots
Types of Relationships
Scatter plots can reveal different types of relationships between variables:
Linear Relationship: Points on the scatterplot follow a somewhat straight line pattern.
Positive Association: Points trend upward to the right; as one variable increases, so does the other.
Negative Association: Points trend downward to the right; as one variable increases, the other decreases.
No Association: Points do not show any discernible pattern.
Linear Correlation Coefficient
Definition and Calculation
The linear correlation coefficient quantifies the strength and direction of the linear relationship between two quantitative variables.
The Greek letter ρ (rho) represents the population correlation coefficient.
The letter r represents the sample correlation coefficient.
Also called the Pearson product moment correlation coefficient.
Formula for the Sample Linear Correlation Coefficient:
Where:
= ith observation of the explanatory variable
= ith observation of the response variable
= sample mean of explanatory variable
= sample mean of response variable
= sample standard deviation of explanatory variable
= sample standard deviation of response variable
= number of individuals in the sample
Alternate formula:
Properties of the Linear Correlation Coefficient
Always between -1 and 1 ().
Unitless measure; independent of the units of x and y.
Positive values of r indicate positive relationships.
Negative values of r indicate negative relationships.
r = 0 indicates no linear relationship.
Not resistant: Outliers can greatly affect the value of r.
Testing for a Linear Relation
Steps for Testing
Determine the absolute value of the correlation coefficient ().
Find the critical value in Table II for the given sample size (n = number of observations).
If is greater than the critical value, a linear relation exists; otherwise, no linear relation exists.
Critical Values for Correlation Coefficient (Table II)
n | Critical Value |
|---|---|
3 | 0.997 |
4 | 0.950 |
5 | 0.878 |
6 | 0.811 |
7 | 0.754 |
8 | 0.707 |
9 | 0.666 |
10 | 0.632 |
11 | 0.602 |
12 | 0.576 |
Lurking Variable
Definition and Impact
A lurking variable is a variable not included in a statistical analysis that may impact the relationship between the variables under study. For example, warmer weather may increase both ice cream sales and shark attacks, but does not mean one causes the other.
Recap: Steps in Investigating Relationships
Begin with a scatterplot to visually assess the relationship.
Quantitatively describe the strength and direction using the correlation coefficient "r".
If a linear relationship exists, proceed to model building.
Examples
Bone Length Research Example
NASA research measured the lengths of the right humerus and right tibia in 11 astronauts. The data are as follows:
Right Humerus (mm) | Right Tibia (mm) |
|---|---|
24.80 | 36.05 |
25.09 | 35.57 |
24.29 | 34.55 |
24.97 | 34.78 |
25.80 | 37.36 |
25.31 | 37.45 |
26.63 | 37.75 |
26.84 | 38.50 |
Draw a scatter diagram treating the humerus length as the explanatory variable and tibia length as the response variable.
Compute the linear correlation coefficient between the two lengths.
Compare the computed correlation coefficient with the critical value to determine if a linear relationship exists.
If the data are converted to inches, the correlation coefficient remains unchanged (unitless property).
Additional info: The notes also emphasize the importance of distinguishing correlation from causation and the role of lurking variables in misleading interpretations.