BackScatterplots, Association, and Correlation: Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Scatterplots, Association, and Correlation
Introduction
Scatterplots are essential tools in statistics for visually examining the relationship between two quantitative variables. Understanding the direction, form, and strength of these relationships is crucial for interpreting data and making informed decisions. Correlation provides a numerical measure of the strength and direction of a linear relationship.
Scatterplots
Scatterplots are graphical displays that show the relationship between two quantitative variables. Each point represents a pair of values.
Purpose: To identify patterns, trends, associations, and outliers.
Best Use: To observe and picture association between variables.
Key Features to Look For
Direction: Indicates whether the relationship is positive, negative, or has no association.
Positive Association: As one variable increases, so does the other (points run from lower left to upper right).
Negative Association: As one variable increases, the other decreases (points run from upper left to lower right).
Form: The general shape of the relationship.
Linear: Points cluster around a straight line.
Nonlinear: Points follow a curved or other non-straight pattern.
Strength: How closely the points follow a clear form.
Strong: Points are tightly clustered.
Weak: Points are widely scattered.
Unusual Features: Outliers or clusters that deviate from the overall pattern.
Rules of Variables
Explanatory (Predictor) Variable: Plotted on the x-axis; used to explain or predict changes in the response variable.
Response Variable: Plotted on the y-axis; the variable being studied or predicted.
Assignment of variables is based on the context of the analysis.
Correlation
Correlation quantifies the strength and direction of a linear relationship between two quantitative variables. It is not affected by changes in the units of measurement and is symmetric with respect to the variables.
Standardized Covariates
To compute correlation, variables are standardized:
,
Correlation Coefficient (r)
The correlation coefficient, denoted as r, is calculated as:
Alternatively, using raw data:
Or, more simply:
, where
Steps to Calculate r
Find the mean of x and y.
Subtract the mean from each value of x and y.
Multiply the deviations for each pair and sum the products.
Divide by (n-1) to get .
Divide by the product of the standard deviations of x and y.
Interpretation of r
r ranges from -1 to +1.
r > 0: Positive association; r < 0: Negative association.
r = 0: No linear association.
The closer |r| is to 1, the stronger the linear relationship.
Assumptions and Conditions for Correlation
Both variables must be quantitative.
The relationship should be linear.
Check for outliers, as they can distort the correlation.
How to Check Conditions
Quantitative Variables Condition: Both variables must be quantitative.
Linearity Condition: The relationship should be linear.
Outlier Condition: Outliers can dramatically affect r.
Properties of Correlation
The sign of r indicates the direction of association.
Correlation is always between -1 and +1.
Correlation is unitless and symmetric (does not depend on which variable is x or y).
Correlation is not resistant to outliers.
Correlation only measures linear association.
Correlation Does Not Imply Causation
A strong correlation does not mean that changes in one variable cause changes in the other.
There may be lurking variables or confounding factors.
Always consider the context and possible alternative explanations.
Common Mistakes
Do not use correlation for categorical variables.
Do not assume causation from correlation.
Be cautious of outliers and non-linear relationships.
Example
Suppose we have data on students' study hours (x) and exam scores (y). A scatterplot shows a positive linear trend, and the calculated r is 0.85, indicating a strong positive linear association between study hours and exam scores.
Summary Table: Key Features of Scatterplots and Correlation
Feature | Description |
|---|---|
Direction | Positive, Negative, or None |
Form | Linear or Nonlinear |
Strength | Strong, Moderate, Weak |
Outliers | Points that deviate from the overall pattern |
Correlation Coefficient (r) | Measures strength and direction of linear association |
Additional info: These notes expand on the original content by providing definitions, formulas, and examples for clarity and completeness.