Skip to main content
Back

Displaying Categorical Data in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 2: Displaying Categorical Data

Displaying for Categorical Variables

Categorical data consists of variables that can be divided into distinct groups or categories. Proper display and summarization of categorical data is essential for understanding patterns and relationships within the data.

  • Categorical variables are those that represent groupings, such as city of residence, diet type, or disease status.

  • The key to displaying categorical data is to group similar items together for clarity and analysis.

Frequency Tables / Relative Frequency Tables

A frequency table lists all categories of a categorical variable along with their counts (frequencies). A relative frequency table expresses these counts as percentages or proportions, making it easier to compare categories.

  • For non-overlapping categories, the sum of their percentages should be 100%.

  • Relative frequency is calculated as:

Example: City of Residence of Transit Riders

The following table shows the proportion of 30-day transit riders by city of residence:

Region

Proportion of past 30-day users (Rider Share)

Burnaby/New West

18%

Richmond/South Delta

7%

Surrey/North Delta/White Rock/Langley

19%

Vancouver

39%

Northeast Sector (Coquitlam/Port Coquitlam/Port Moody/Pitt Meadows/Maple Ridge/Anmore & Belcarra)

8%

North Vancouver

7%

West Vancouver

2%

Total

100%

Bar Charts

A bar chart is a graphical display for categorical data, where each category is represented by a rectangular bar. The height of the bar corresponds to the frequency or relative frequency of the category.

  • All bars have the same width.

  • Bar charts are useful for comparing the sizes of different categories.

Example: Bar Chart of City of Residence

The bar chart visually compares the proportions of transit riders from different cities, making it easy to see which city has the highest or lowest share.

Pie Charts

A pie chart displays categories as slices of a circle, with the area of each slice proportional to the fraction of the whole for that category.

  • Pie charts are useful for showing the relative proportions of categories within a whole.

Example: Pie Chart of City of Residence

The pie chart provides a visual representation of the distribution of transit riders by city, with each slice corresponding to a city's share.

Contingency Tables

Contingency tables (also called cross-tabulations) are used to display the relationship between two categorical variables. Each cell in the table shows the frequency or percentage for a combination of categories.

  • Row and column totals are called marginal distributions.

  • The grand total represents the overall sample size.

Example: Diet Type and Heart Disease

Having heart disease?

Total

Yes

No

High cholesterol diet

11

4

15

Low cholesterol diet

2

6

8

Total

13

10

23

Marginal Distributions

Marginal distributions summarize the frequency or proportion of one variable, ignoring the other variable. They are found in the margins (totals) of a contingency table.

  • Example: Proportion of individuals not having coronary heart disease:

  • Example: Proportion of individuals having high cholesterol diet:

Conditional Distributions

Conditional distributions show the distribution of one variable for a fixed value of the other variable. This is useful for exploring relationships between variables.

  • Example: Percentage of individuals with low cholesterol diet who have heart disease:

  • Example: Percentage of individuals with high cholesterol diet who have heart disease:

  • Example: Proportion of individuals with heart disease who have high cholesterol diet:

Interpreting Independence

If two categorical variables are independent, the conditional distributions should be similar across categories. In the example above, the percentage of heart disease is much higher among those with a high cholesterol diet (73.3%) compared to those with a low cholesterol diet (25%), suggesting an association between diet type and heart disease.

Summary Table: Marginal and Conditional Distributions

Question

Distribution Type

Calculation

Result

Proportion not having heart disease

Marginal

43.5%

Proportion having high cholesterol diet

Marginal

65.2%

Proportion with low cholesterol diet who have heart disease

Conditional

25%

Proportion with high cholesterol diet who have heart disease

Conditional

73.3%

Proportion with heart disease who have high cholesterol diet

Conditional

84.6%

Key Terms

  • Frequency Table: Table showing counts for each category.

  • Relative Frequency Table: Table showing percentages or proportions for each category.

  • Bar Chart: Graphical display using bars to represent frequencies.

  • Pie Chart: Circular chart divided into slices representing proportions.

  • Contingency Table: Table showing frequencies for combinations of two categorical variables.

  • Marginal Distribution: Distribution of one variable, ignoring the other.

  • Conditional Distribution: Distribution of one variable for a fixed value of the other variable.

  • Independence: Two variables are independent if the conditional distributions are the same across categories.

Example Applications

  • Comparing proportions of transit riders by city to allocate resources.

  • Analyzing the relationship between diet type and heart disease to inform public health recommendations.

Additional info: These notes are based on lecture slides for a college-level statistics course, focusing on the graphical and tabular display of categorical data and interpretation of relationships between categorical variables.

Pearson Logo

Study Prep