Skip to main content
Back

Chapter 3: Association – Exploring Relationships Between Two Variables

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Association Between Two Variables

Introduction to Association

Understanding the association between variables is fundamental in statistics. An association exists when certain values of one variable are more likely to occur with specific values of another variable. This chapter explores how to identify, describe, and analyze associations between both categorical and quantitative variables.

  • Response Variable (Dependent Variable): The outcome variable on which comparisons are made.

  • Explanatory Variable (Independent Variable): The variable that explains or influences changes in the response variable.

  • Association: Exists if particular values for one variable are more likely to occur with certain values of the other variable.

Association Between Two Categorical Variables

Contingency Tables

A contingency table is used to display the relationship between two categorical variables. The rows represent categories of one variable, and the columns represent categories of the other. The entries are frequencies (counts).

  • Row and column totals provide marginal frequencies for each variable.

  • Cell counts show the joint frequency for each combination of categories.

Example: Meal Plans in College

Recommend a Meal Plan

Have a Meal Plan

Total

Yes

No

Yes

58

99

157

No

51

2

53

Total

109

101

210

The percentage of students who would recommend a meal plan is:

Conditional Proportions

Conditional proportions help determine if an association exists by comparing the proportion of one variable within levels of another.

Recommend a Meal Plan

Have a Meal Plan

Total

n

Yes

No

Yes

0.37

0.63

1

157

No

0.96

0.04

1

53

  • Only 37% of those with a meal plan recommend it, while 96% of those without a meal plan recommend it.

  • This significant difference indicates an association between the variables.

Visualizing Associations: Side-by-Side Bar Plots

  • Side-by-side bar plots display conditional proportions for each category of the explanatory variable.

  • If there is no association, the bars for each group will be similar in height.

Class Exercise: Gender Gap in Party Identification

Party Identification

Democrat

Independent

Republican

Total

Male

299

365

232

896

Female

422

381

273

1,076

Total

721

746

505

1,972

  • Identify response and explanatory variables.

  • Calculate joint, marginal, and conditional proportions to analyze the association.

Association Between Two Quantitative Variables

Scatterplots

A scatterplot is a graphical display of the relationship between two quantitative variables. The explanatory variable is plotted on the horizontal axis (x), and the response variable on the vertical axis (y).

  • Trend: Linear, curved, clusters, or no pattern.

  • Direction: Positive, negative, or none.

  • Strength: How closely the points fit the trend.

  • Outliers should be noted as they can affect analysis.

Example: There is a strong negative linear association between car weight and miles per gallon (mpg).

Correlation Coefficient (r)

The correlation coefficient measures the strength and direction of the linear association between two quantitative variables.

Formula:

  • Range: -1 to +1

  • r > 0: Positive association; r < 0: Negative association

  • r close to ±1: Strong linear association; r close to 0: Weak association

  • Unitless and unaffected by variable units

  • Not resistant to outliers

  • Only measures linear relationships

Example: For mpg and weight, indicates a strong negative linear correlation.

Regression Line

The regression line predicts the value of the response variable y as a linear function of the explanatory variable x.

Equation:

  • a: y-intercept

  • b: Slope

Formulas for a and b:

Example: For mpg and weight,

  • The slope indicates the change in predicted y for a one-unit increase in x.

  • The y-intercept is the predicted value when x = 0 (may not always be meaningful).

Coefficient of Determination (r2)

The squared correlation () measures the proportion of variability in the response variable explained by the linear relationship with the explanatory variable.

Example: If , then . This means 75.69% of the variation in mpg is explained by car weight.

Important Points in Analyzing Associations

  • Extrapolation: Predicting y for x values outside the observed range is risky and may not be valid.

  • Influential Outliers: Points with extreme x values that do not follow the trend can greatly affect the regression line.

  • Regression Outlier: An observation far from the trend of the data.

Correlation vs. Causation

  • A strong correlation does not imply that one variable causes changes in the other.

  • Correlation only indicates association, not causality.

Lurking Variables

A lurking variable is an unmeasured variable that influences the relationship between the explanatory and response variables.

  • Example: Age can be a lurking variable affecting both height and math score in children.

  • Lurking variables can create spurious associations or mask real ones.

Simpson’s Paradox and Confounding

  • Simpson’s Paradox: The direction of an association between two variables reverses when a third variable is considered.

  • Confounding: Occurs when two explanatory variables are both associated with the response variable and with each other, making it difficult to separate their effects.

Summary Table: Key Concepts in Association Analysis

Concept

Description

Example

Contingency Table

Displays frequencies for two categorical variables

Meal plan vs. recommendation

Scatterplot

Graphical display for two quantitative variables

Weight vs. mpg

Correlation (r)

Measures strength and direction of linear association

r = -0.87 for weight and mpg

Regression Line

Predicts y from x using

r2

Proportion of variance explained

0.7569 (75.69%)

Lurking Variable

Unmeasured variable affecting association

Age in height/math score

Simpson’s Paradox

Association reverses with third variable

Party ID by gender and age

Additional info: This summary includes expanded explanations, formulas, and examples to ensure the notes are self-contained and suitable for exam preparation.

Pearson Logo

Study Prep