Skip to main content
Back

Scatterplots, Association, and Correlation: Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Scatterplots, Association, and Correlation

Introduction

Scatterplots are essential tools in statistics for visually examining the relationship between two quantitative variables. Understanding the direction, form, and strength of these relationships is crucial for interpreting data and making informed decisions. Correlation provides a numerical measure of the strength and direction of a linear relationship.

Scatterplots

Scatterplots are graphical displays that show the relationship between two quantitative variables. Each point represents a pair of values.

  • Purpose: To identify patterns, trends, associations, and outliers.

  • Best Use: To observe and picture association between variables.

Key Features to Look For

  1. Direction: Indicates whether the relationship is positive, negative, or has no association.

    • Positive Association: As one variable increases, so does the other (points run from lower left to upper right).

    • Negative Association: As one variable increases, the other decreases (points run from upper left to lower right).

  2. Form: The general shape of the relationship.

    • Linear: Points cluster around a straight line.

    • Nonlinear: Points follow a curved or other non-straight pattern.

  3. Strength: How closely the points follow a clear form.

    • Strong: Points are tightly clustered.

    • Weak: Points are widely scattered.

  4. Unusual Features: Outliers or clusters that deviate from the overall pattern.

Rules of Variables

  • Explanatory (Predictor) Variable: Plotted on the x-axis; used to explain or predict changes in the response variable.

  • Response Variable: Plotted on the y-axis; the variable being studied or predicted.

  • Assignment of variables is based on the context of the analysis.

Correlation

Correlation quantifies the strength and direction of a linear relationship between two quantitative variables. It is not affected by changes in the units of measurement and is symmetric with respect to the variables.

Standardized Covariates

To compute correlation, variables are standardized:

,

Correlation Coefficient (r)

The correlation coefficient, denoted as r, is calculated as:

Alternatively, using raw data:

Or, more simply:

, where

Steps to Calculate r

  1. Find the mean of x and y.

  2. Subtract the mean from each value of x and y.

  3. Multiply the deviations for each pair and sum the products.

  4. Divide by (n-1) to get .

  5. Divide by the product of the standard deviations of x and y.

Interpretation of r

  • r ranges from -1 to +1.

  • r > 0: Positive association; r < 0: Negative association.

  • r = 0: No linear association.

  • The closer |r| is to 1, the stronger the linear relationship.

Assumptions and Conditions for Correlation

  • Both variables must be quantitative.

  • The relationship should be linear.

  • Check for outliers, as they can distort the correlation.

How to Check Conditions

  1. Quantitative Variables Condition: Both variables must be quantitative.

  2. Linearity Condition: The relationship should be linear.

  3. Outlier Condition: Outliers can dramatically affect r.

Properties of Correlation

  • The sign of r indicates the direction of association.

  • Correlation is always between -1 and +1.

  • Correlation is unitless and symmetric (does not depend on which variable is x or y).

  • Correlation is not resistant to outliers.

  • Correlation only measures linear association.

Correlation Does Not Imply Causation

  • A strong correlation does not mean that changes in one variable cause changes in the other.

  • There may be lurking variables or confounding factors.

  • Always consider the context and possible alternative explanations.

Common Mistakes

  • Do not use correlation for categorical variables.

  • Do not assume causation from correlation.

  • Be cautious of outliers and non-linear relationships.

Example

Suppose we have data on students' study hours (x) and exam scores (y). A scatterplot shows a positive linear trend, and the calculated r is 0.85, indicating a strong positive linear association between study hours and exam scores.

Summary Table: Key Features of Scatterplots and Correlation

Feature

Description

Direction

Positive, Negative, or None

Form

Linear or Nonlinear

Strength

Strong, Moderate, Weak

Outliers

Points that deviate from the overall pattern

Correlation Coefficient (r)

Measures strength and direction of linear association

Additional info: These notes expand on the original content by providing definitions, formulas, and examples for clarity and completeness.

Pearson Logo

Study Prep