Skip to main content
Back

Displaying and Describing Categorical Data: Contingency Tables, Conditional Distributions, and Graphical Summaries

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Displaying and Describing Categorical Data

Contingency Tables and the Titanic Data

Contingency tables are a fundamental tool in statistics for summarizing the relationship between two categorical variables. They allow us to examine how individuals are distributed across combinations of categories, and are especially useful for exploring associations and dependencies between variables.

  • Contingency Table: A matrix that displays the frequency distribution of variables. Each cell shows the count for a specific combination of categories.

  • Example: The Titanic dataset is a classic example, where the two variables are Class (First, Second, Third, Crew) and Survival Status (Alive, Dead).

Table: Contingency Table of Ticket Class and Survival

Class

First

Second

Third

Crew

Total

Alive

203

118

178

212

711

Dead

122

167

528

673

1490

Total

325

285

706

885

2201

Contingency table of ticket class and survival Contingency table highlighting crew deaths

Marginal and Joint Distributions

Marginal distributions are the totals for each category of a single variable, ignoring the other variable. Joint distributions show the proportion or percentage for each cell relative to the overall total.

  • Marginal Distribution: The distribution of either variable alone, found in the margins (totals) of the table.

  • Joint Distribution: The proportion of the total for each cell (e.g., the percentage of all passengers who were First Class and survived).

Table: Joint Distribution (% of Total)

Class

First

Second

Third

Crew

All

Alive

9.22

5.36

8.09

9.63

32.30

Dead

5.54

7.59

23.99

30.58

67.70

All

14.77

12.95

32.08

40.21

100.00

Tabulated statistics: Class, Survival Status Tabulated statistics: Survival Status, Class

Conditional Distributions

Conditional distributions show the distribution of one variable for individuals who satisfy a condition on another variable. For example, the distribution of class among survivors, or the survival rate within each class.

  • Row Percentages: Show the distribution of class among those who survived or perished.

  • Column Percentages: Show the survival rate within each class.

Table: Class Conditional on Survival Status (Row Percentages)

Class

First

Second

Third

Crew

Total

Alive

28.6%

16.6%

25.0%

29.8%

100%

Dead

8.2%

11.2%

35.4%

45.2%

100%

Conditional distribution of ticket class, conditional on having survived Conditional distribution of ticket class, conditional on having perished

Table: Survival Status Conditional on Class (Column Percentages)

Class

First

Second

Third

Crew

All

Alive

62.46%

41.40%

25.21%

23.95%

32.30%

Dead

37.54%

58.60%

74.79%

76.05%

67.70%

All

100.00%

100.00%

100.00%

100.00%

100.00%

Survival Status conditional on Class

Graphical Summaries: Pitfalls and Best Practices

Visual displays are essential for understanding categorical data, but they must be constructed carefully to avoid misleading impressions.

  • Area Principle: The area occupied by a part of a graph should correspond to the magnitude of the value it represents. Violating this principle can distort perception.

  • Example: The Titanic ship graphic visually exaggerates the number of crew members by using area (ship size) rather than length or height proportional to counts.

Misleading Titanic ship graphic

Pie charts and bar charts are common for displaying categorical data. Pie charts show parts of a whole, while bar charts are better for comparing counts or percentages across categories.

Pie charts of Titanic class distribution

Statistical Software: Cross Tabulation and Chi-Square

Statistical software such as Minitab can be used to generate contingency tables, calculate row and column percentages, and perform chi-square tests for association between categorical variables.

  • Cross Tabulation Dialogs: Allow users to specify which variables are rows and columns, and to select display options such as counts, row percents, column percents, or total percents.

Cross Tabulation and Chi-Square dialog (total percents) Cross Tabulation and Chi-Square dialog (row percents) Cross Tabulation and Chi-Square dialog (row percents) Cross Tabulation and Chi-Square dialog (column percents)

Summary Table: Types of Percentages in Contingency Tables

Type

Interpretation

When to Use

Row Percentages

Distribution of categories within each row (e.g., class among survivors)

Comparing composition within a group

Column Percentages

Distribution of categories within each column (e.g., survival rate within each class)

Comparing outcomes across groups

Total Percentages

Each cell as a percentage of the grand total

Assessing overall proportions

Key Points and Best Practices

  • Always check the area principle when constructing visual displays.

  • Use bar charts for comparing counts or percentages; use pie charts for showing parts of a whole.

  • Contingency tables are essential for exploring relationships between two categorical variables.

  • Conditional distributions (row or column percentages) reveal associations and dependencies.

  • Statistical software can automate calculations and provide options for different types of percentages.

Pearson Logo

Study Prep