Skip to main content
Back

Association Between Variables: Categorical and Quantitative Analysis

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 3: Association

Association Between Two Categorical Variables

Understanding the relationship between two categorical variables is fundamental in statistics. The association exists if certain values of one variable are more likely to occur with specific values of another variable.

  • Response Variable (Dependent Variable): The outcome variable on which comparisons are made.

  • Explanatory Variable (Independent Variable): The variable that explains changes in the response variable.

  • Association: Exists when particular values for one variable are more likely to occur with certain values of the other variable.

Example: Is there an association between college GPA and high school GPA? Here, college GPA is the response variable, and high school GPA is the explanatory variable.

Contingency Tables

Contingency tables are used to display the frequencies of two categorical variables.

  • Rows represent categories of one variable.

  • Columns represent categories of the other variable.

  • Entries are frequencies (counts).

Example Table: Meal Plans in College

Recommend a Meal Plan

Have a Meal Plan: Yes

Have a Meal Plan: No

Total

Yes

58

51

109

No

99

2

101

Total

157

53

210

The percentage of students who would recommend a meal plan is .

Conditional Proportions

Conditional proportions help determine if an association exists by comparing proportions within levels of the explanatory variable.

Recommend a Meal Plan

Yes

No

Total

n

Have a Meal Plan: Yes

0.37

0.63

1

157

Have a Meal Plan: No

0.96

0.04

1

53

  • Only 37% of those with a meal plan recommend it, while 96% of those without a meal plan recommend it.

  • Significant differences in conditional proportions indicate association.

Side-By-Side Bar Plots

Bar plots visually compare conditional proportions across categories. If there is no association, proportions for the response variable are similar across levels of the explanatory variable.

Class Exercise: Gender Gap in Party Identification

Party Identification

Democrat

Independent

Republican

Total

Male

299

365

232

896

Female

422

381

273

1,076

Total

721

746

505

1,972

  • Calculate proportions for specific combinations (e.g., male and Republican).

  • Find conditional proportions for party identification given gender.

  • Visualize differences using side-by-side bar plots.

Association Between Two Quantitative Variables

Scatterplots are used to display the association between two quantitative variables. The explanatory variable is plotted on the horizontal axis, and the response variable on the vertical axis.

  • Trend: Linear, curved, clusters, or no pattern.

  • Direction: Positive, negative, or no direction.

  • Strength: How closely points fit the trend.

  • Outliers: Points that deviate from the overall trend.

Example: There is a strong negative linear association between weight and miles per gallon (mpg) of a car.

Scatterplot Creation (TI-Calculator)

  • Store explanatory variable values under L1 and response variable values under L2.

  • Select scatterplot option and assign lists to X and Y axes.

  • Use ZOOM 9 for better visualization.

The Correlation Coefficient, r

The correlation coefficient measures the strength and direction of the linear association between two quantitative variables.

  • Formula:

  • r ranges from -1 to +1.

  • Positive r: positive association; negative r: negative association.

  • r close to ±1: strong linear association; r close to 0: weak association.

  • Correlation is unitless and not resistant to outliers.

  • Correlation only measures linear relationships.

Example: For mpg and weight, indicates strong negative linear correlation.

Regression Line

The regression line predicts the value of the response variable as a linear function of the explanatory variable.

  • Equation:

  • y-intercept (a):

  • Slope (b):

Example: For mpg and weight,

  • Slope interpretation: On average, mpg decreases by 5.344 for each 1000 lb increase in weight.

  • y-intercept may not be meaningful if x = 0 is outside the observed range.

Squared Correlation, r2

The squared correlation measures the proportion of variability in the response variable explained by the linear relationship.

  • Formula:

  • Interpretation: For , means 75.69% of variation in mpg is explained by weight.

Important Points in Analyzing Associations

  • Extrapolation: Predicting y for x values outside the observed range is risky; the relationship may not hold.

  • Influential Outliers: Outliers with extreme x values can significantly affect regression results.

  • Regression Outlier: An observation far from the trend.

Example: An influential outlier can distort the regression line, while a non-influential outlier may not.

Correlation vs Causation

  • Strong correlation does not imply causation.

  • Correlation indicates association, not a cause-effect relationship.

Lurking Variables

A lurking variable is not included in the analysis but can influence the relationship between variables.

  • Example: Age is a lurking variable affecting both height and math score in children.

  • Lurking variables may be common causes for both explanatory and response variables.

Simpson’s Paradox and Confounding

  • Simpson’s Paradox: The direction of association changes when a third variable is included and data is analyzed at separate levels.

  • Confounding: Two explanatory variables are both associated with the response variable and with each other, making it difficult to distinguish their effects.

Pearson Logo

Study Prep