Skip to main content
Back

Scatterplots, Association, and Correlation: Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 6: Scatterplots, Association, and Correlation

Section 6.1: Scatterplots

Scatterplots are essential graphical tools in statistics for visualizing the relationship between two quantitative variables. They help in detecting patterns, trends, relationships, and extraordinary values (outliers) in data.

  • Definition: A scatterplot is a graph in which each pair of values (x, y) is plotted as a point in a two-dimensional space, with one variable on the x-axis and the other on the y-axis.

  • Purpose: Used to identify the type and strength of association between variables, spot trends, and detect outliers.

  • Example: A scatterplot of hurricane location prediction error vs. year can show how prediction errors have changed over time.

Direction of Association

The direction of the association describes how the variables change together.

  • Negative Direction: As one variable increases, the other decreases.

  • Positive Direction: As one variable increases, the other also increases.

  • Example: In a scatterplot, a downward trend from left to right indicates a negative association, while an upward trend indicates a positive association.

Form of Association

The form describes the general shape of the relationship between variables.

  • Linear: Points cluster near a straight line.

  • Curved: The relationship bends in a direction; sometimes can be straightened with a transformation.

  • Nonlinear/Complex: The relationship curves up and down, making it difficult to straighten.

  • Example: A linear form is suitable for correlation analysis, while a curved form may require data transformation.

Strength of the Relationship

The strength of the relationship refers to how closely the points follow a specific form (usually a line).

  • Strong Linear Relationship: Points are tightly clustered around a line.

  • Moderate Linear Relationship: Points are somewhat scattered but still show a general linear trend.

  • Weak Linear Relationship: Points are widely scattered with little apparent trend.

  • Example: In hurricane prediction error data, moderate scatter around a straight line indicates a moderate linear relationship.

Outliers

An outlier is a data point that stands away from the overall pattern of the scatterplot.

  • Definition: An outlier is a point that does not fit the general trend of the data.

  • Importance: Outliers are almost always interesting and deserve special attention, as they may indicate errors, unusual cases, or important phenomena.

  • Example: In hurricane prediction data, a year with an unusually high prediction error would be an outlier.

Summary Table: Types of Association in Scatterplots

Type

Description

Example

Positive Linear

As x increases, y increases; points cluster near an upward-sloping line

Height vs. Weight

Negative Linear

As x increases, y decreases; points cluster near a downward-sloping line

Hours worked vs. Vacation days

Nonlinear

Points follow a curved pattern

Age vs. Income (may peak at middle age)

No Association

No discernible pattern

Shoe size vs. IQ

Key Takeaways

  • Always create a scatterplot to visually assess the relationship between two quantitative variables.

  • Examine the direction, form, strength, and presence of outliers.

  • Only use correlation to summarize the strength of a linear relationship.

  • Outliers and nonlinearity can distort the interpretation of correlation.

Additional info: These notes are based on textbook slides for an introductory college statistics course, focusing on the graphical and conceptual analysis of bivariate data using scatterplots.

Pearson Logo

Study Prep