Skip to main content
Back

Contingency Tables and Categorical Data Analysis

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Contingency Tables

Definition and Structure

A contingency table is a tabular method for displaying the frequency counts of two categorical variables simultaneously. It allows for the analysis of the relationship between these variables by organizing data into rows and columns, each representing a category of one variable.

  • Rows: Categories of Variable A

  • Columns: Categories of Variable B

  • Cells: Counts for each combination of categories

  • Totals: Marginal totals for each row and column, and a grand total

Example Table Structure:

Category B1

Category B2

Total

Category A1

a

b

a + b

Category A2

c

d

c + d

Total

a + c

b + d

n

Marginal, Joint, and Conditional Distributions

Marginal Distributions

A marginal distribution summarizes the totals for one variable, regardless of the other variable. It is found by summing across rows (for one variable) or columns (for the other variable).

  • Row totals: Distribution of Variable A

  • Column totals: Distribution of Variable B

Formula for Marginal Percentage:

Example: If 43 out of 111 customers shop during the day:

Joint Distributions

A joint distribution refers to the probability or proportion of observations that fall into both a specific category of Variable A and Variable B simultaneously.

  • Formula:

  • Example: Probability of being a sophomore and a biology major.

Conditional Distributions

A conditional distribution describes the distribution of one variable within a specific group defined by the other variable. It answers questions like, "What percent of group B has characteristic A?"

  • Key phrase: “Of the ____ group…”

  • Formula:

  • Example: “What % of moderates are vegetarian?”

  • Example: “What % of night customers pay with credit?”

Note: Always divide by the total of the group that defines the condition (row or column total).

Comparing Distributions

Summary Table

Type

Definition

Denominator

Marginal

Totals for one variable

Grand total (n)

Joint

Probability of being in both categories

Grand total (n)

Conditional

Distribution of one variable within a category of the other

Row or column total (depending on condition)

Independence in Contingency Tables

Definition and Test

Two categorical variables are independent if knowing the value of one does not change the distribution of the other. To test for independence, compare the conditional distributions:

  • If , then A and B are likely independent.

  • If the conditional percentages differ noticeably, the variables are not independent.

Example:

  • Day customers: ~33% use credit

  • Night customers: ~56% use credit

  • Since these percentages are not close, payment method and time of day are not independent.

Step-by-Step Problem Solving Template

  1. Identify the type of distribution:

    • “Of all people” → marginal

    • “Of liberals…” → conditional

    • “Liberal & vegetarian” → joint

  2. Find the correct denominator:

    • Marginal → total n

    • Conditional → row or column total

    • Joint → total n

  3. Compute the percentage:

  4. Interpret clearly:

    • “Among moderates, 16.7% are vegetarian.”

    • “Payment method depends on time of day.”

Practice Problems (Pearson-style)

  • A table shows political party × pet ownership. Find the marginal distribution of pet ownership.

  • In a table of gender × preferred drink, compute .

  • A table shows time of day × payment method. Are the variables independent?

  • A table shows class year × major. Find the joint probability of being a sophomore AND a biology major.

Quick Interpretation Phrases

  • “Of all individuals surveyed…” → marginal

  • “Among ___ group…” → conditional

  • “The distributions differ noticeably, so the variables are not independent.”

  • “The conditional percentages are similar, suggesting independence.”

Additional info:

  • Contingency tables are foundational for the Chi-Square Test for Independence (see Ch. 25), which formally tests whether two categorical variables are independent.

  • Understanding marginal, joint, and conditional distributions is essential for interpreting survey data and for further topics in categorical data analysis.

Pearson Logo

Study Prep