Skip to main content
Back

Scatter Diagrams and Correlation: Understanding Relationships Between Two Variables

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 4.1: Scatter Diagrams and Correlation

Scatter Diagrams: Visualizing Relationships Between Two Variables

Scatter diagrams are essential tools in statistics for visualizing the relationship between two quantitative variables measured on the same individual. Each point in the diagram represents an individual, with the explanatory variable (predictor) plotted on the horizontal axis and the response variable plotted on the vertical axis.

  • Explanatory Variable: The variable that explains or predicts changes in another variable (often denoted as x).

  • Response Variable: The variable whose value is explained or predicted (often denoted as y).

  • Scatter Diagram: A graph showing the relationship between two quantitative variables.

Example: In a study of club-head speed (mph) and golf ball distance (yards), club-head speed is the explanatory variable, and distance is the response variable.

Scatter diagrams help distinguish between linear, nonlinear, and no relationship between variables.

Types of scatter diagrams: linear, nonlinear, and no relation

Types of Relationships in Scatter Diagrams

Scatter diagrams can reveal different types of associations:

  • Linear Positive Association: As one variable increases, the other also increases. Example: Club-head speed and distance.

  • Linear Negative Association: As one variable increases, the other decreases. Example: Smoking rate and lung capacity.

  • Nonlinear Association: The relationship is not a straight line; it may curve or follow another pattern.

  • No Association: No discernible pattern between the variables.

Properties of the Linear Correlation Coefficient

The linear correlation coefficient (Pearson product moment correlation coefficient) measures the strength and direction of the linear relationship between two quantitative variables. The population correlation coefficient is denoted by , and the sample correlation coefficient by .

  • Range:

  • Perfect Positive Linear Relation:

  • Perfect Negative Linear Relation:

  • Strength: The closer is to or , the stronger the linear association.

  • No Linear Relation: close to $0$ indicates little or no linear relation (but possibly a nonlinear relation).

  • Unitless: The correlation coefficient does not depend on the units of measurement.

  • Not Resistant: Outliers can significantly affect the value of .

Computing the Linear Correlation Coefficient

The formula for the sample linear correlation coefficient is:

  • Interpretation: The sign of indicates the direction of the relationship; the magnitude indicates the strength.

  • Calculation: Can be performed by hand or using statistical software such as JMP.

Example: Given the following data set:

x

2

6

6

7

9

y

8

7

6

9

5

Tabular data for x and y values

Calculate using the formula or JMP. For this data, $r$ was found to be -0.946, indicating a strong negative linear relationship.

Determining Whether a Linear Relation Exists Between Two Variables

To test for a linear relation, follow these steps:

  1. Determine the absolute value of the correlation coefficient ().

  2. Find the critical value for the given sample size from a reference table.

  3. If is greater than the critical value, a linear relation exists; otherwise, it does not.

Critical Values Table: Used to compare the calculated with the threshold for statistical significance.

Critical values for correlation coefficient table

Example: For a sample size of 6, the critical value is 0.811. If , a linear relation exists. If , no linear relation exists.

Application: In the club-head speed and distance example, use JMP to find and compare to the critical value to determine if a linear relationship exists.

Summary Table: Types of Relationships and Their Interpretation

Type of Relationship

Correlation Coefficient ()

Interpretation

Perfect Positive Linear

+1

All points lie on a line sloping upward

Perfect Negative Linear

-1

All points lie on a line sloping downward

Strong Positive Linear

Close to +1

Points cluster around a line sloping upward

Strong Negative Linear

Close to -1

Points cluster around a line sloping downward

No Linear Relation

Close to 0

No clear pattern; points scattered

Additional info: The notes also reference the use of JMP software for graphical displays and calculation of correlation coefficients, which is a common practice in statistics courses for data analysis.

Pearson Logo

Study Prep