BackContingency Tables and Categorical Data Analysis
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Contingency Tables
Definition and Structure
A contingency table is a tabular method for displaying the frequency counts of two categorical variables simultaneously. It allows for the analysis of the relationship between these variables by organizing data into rows and columns, each representing a category of one variable.
Rows: Categories of Variable A
Columns: Categories of Variable B
Cells: Counts for each combination of categories
Totals: Marginal totals for each row and column, and a grand total
Example Table Structure:
Category B1 | Category B2 | Total | |
|---|---|---|---|
Category A1 | a | b | a + b |
Category A2 | c | d | c + d |
Total | a + c | b + d | n |
Marginal, Joint, and Conditional Distributions
Marginal Distributions
A marginal distribution summarizes the totals for one variable, regardless of the other variable. It is found by summing across rows (for one variable) or columns (for the other variable).
Row totals: Distribution of Variable A
Column totals: Distribution of Variable B
Formula for Marginal Percentage:
Example: If 43 out of 111 customers shop during the day:
Joint Distributions
A joint distribution refers to the probability or proportion of observations that fall into both a specific category of Variable A and Variable B simultaneously.
Formula:
Example: Probability of being a sophomore and a biology major.
Conditional Distributions
A conditional distribution describes the distribution of one variable within a specific group defined by the other variable. It answers questions like, "What percent of group B has characteristic A?"
Key phrase: “Of the ____ group…”
Formula:
Example: “What % of moderates are vegetarian?”
Example: “What % of night customers pay with credit?”
Note: Always divide by the total of the group that defines the condition (row or column total).
Comparing Distributions
Summary Table
Type | Definition | Denominator |
|---|---|---|
Marginal | Totals for one variable | Grand total (n) |
Joint | Probability of being in both categories | Grand total (n) |
Conditional | Distribution of one variable within a category of the other | Row or column total (depending on condition) |
Independence in Contingency Tables
Definition and Test
Two categorical variables are independent if knowing the value of one does not change the distribution of the other. To test for independence, compare the conditional distributions:
If , then A and B are likely independent.
If the conditional percentages differ noticeably, the variables are not independent.
Example:
Day customers: ~33% use credit
Night customers: ~56% use credit
Since these percentages are not close, payment method and time of day are not independent.
Step-by-Step Problem Solving Template
Identify the type of distribution:
“Of all people” → marginal
“Of liberals…” → conditional
“Liberal & vegetarian” → joint
Find the correct denominator:
Marginal → total n
Conditional → row or column total
Joint → total n
Compute the percentage:
Interpret clearly:
“Among moderates, 16.7% are vegetarian.”
“Payment method depends on time of day.”
Practice Problems (Pearson-style)
A table shows political party × pet ownership. Find the marginal distribution of pet ownership.
In a table of gender × preferred drink, compute .
A table shows time of day × payment method. Are the variables independent?
A table shows class year × major. Find the joint probability of being a sophomore AND a biology major.
Quick Interpretation Phrases
“Of all individuals surveyed…” → marginal
“Among ___ group…” → conditional
“The distributions differ noticeably, so the variables are not independent.”
“The conditional percentages are similar, suggesting independence.”
Additional info:
Contingency tables are foundational for the Chi-Square Test for Independence (see Ch. 25), which formally tests whether two categorical variables are independent.
Understanding marginal, joint, and conditional distributions is essential for interpreting survey data and for further topics in categorical data analysis.