BackDisplaying Categorical Data in Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 2: Displaying Categorical Data
Displaying for Categorical Variables
Categorical data consists of variables that can be divided into distinct groups or categories. Proper display and summarization of categorical data is essential for understanding patterns and relationships within the data.
Categorical variables are those that represent groupings, such as city of residence, diet type, or disease status.
The key to displaying categorical data is to group similar items together for clarity and analysis.
Frequency Tables / Relative Frequency Tables
A frequency table lists all categories of a categorical variable along with their counts (frequencies). A relative frequency table expresses these counts as percentages or proportions, making it easier to compare categories.
For non-overlapping categories, the sum of their percentages should be 100%.
Relative frequency is calculated as:
Example: City of Residence of Transit Riders
The following table shows the proportion of 30-day transit riders by city of residence:
Region | Proportion of past 30-day users (Rider Share) |
|---|---|
Burnaby/New West | 18% |
Richmond/South Delta | 7% |
Surrey/North Delta/White Rock/Langley | 19% |
Vancouver | 39% |
Northeast Sector (Coquitlam/Port Coquitlam/Port Moody/Pitt Meadows/Maple Ridge/Anmore & Belcarra) | 8% |
North Vancouver | 7% |
West Vancouver | 2% |
Total | 100% |
Bar Charts
A bar chart is a graphical display for categorical data, where each category is represented by a rectangular bar. The height of the bar corresponds to the frequency or relative frequency of the category.
All bars have the same width.
Bar charts are useful for comparing the sizes of different categories.
Example: Bar Chart of City of Residence
The bar chart visually compares the proportions of transit riders from different cities, making it easy to see which city has the highest or lowest share.
Pie Charts
A pie chart displays categories as slices of a circle, with the area of each slice proportional to the fraction of the whole for that category.
Pie charts are useful for showing the relative proportions of categories within a whole.
Example: Pie Chart of City of Residence
The pie chart provides a visual representation of the distribution of transit riders by city, with each slice corresponding to a city's share.
Contingency Tables
Contingency tables (also called cross-tabulations) are used to display the relationship between two categorical variables. Each cell in the table shows the frequency or percentage for a combination of categories.
Row and column totals are called marginal distributions.
The grand total represents the overall sample size.
Example: Diet Type and Heart Disease
Having heart disease? | Total | ||
|---|---|---|---|
Yes | No | ||
High cholesterol diet | 11 | 4 | 15 |
Low cholesterol diet | 2 | 6 | 8 |
Total | 13 | 10 | 23 |
Marginal Distributions
Marginal distributions summarize the frequency or proportion of one variable, ignoring the other variable. They are found in the margins (totals) of a contingency table.
Example: Proportion of individuals not having coronary heart disease:
Example: Proportion of individuals having high cholesterol diet:
Conditional Distributions
Conditional distributions show the distribution of one variable for a fixed value of the other variable. This is useful for exploring relationships between variables.
Example: Percentage of individuals with low cholesterol diet who have heart disease:
Example: Percentage of individuals with high cholesterol diet who have heart disease:
Example: Proportion of individuals with heart disease who have high cholesterol diet:
Interpreting Independence
If two categorical variables are independent, the conditional distributions should be similar across categories. In the example above, the percentage of heart disease is much higher among those with a high cholesterol diet (73.3%) compared to those with a low cholesterol diet (25%), suggesting an association between diet type and heart disease.
Summary Table: Marginal and Conditional Distributions
Question | Distribution Type | Calculation | Result |
|---|---|---|---|
Proportion not having heart disease | Marginal | 43.5% | |
Proportion having high cholesterol diet | Marginal | 65.2% | |
Proportion with low cholesterol diet who have heart disease | Conditional | 25% | |
Proportion with high cholesterol diet who have heart disease | Conditional | 73.3% | |
Proportion with heart disease who have high cholesterol diet | Conditional | 84.6% |
Key Terms
Frequency Table: Table showing counts for each category.
Relative Frequency Table: Table showing percentages or proportions for each category.
Bar Chart: Graphical display using bars to represent frequencies.
Pie Chart: Circular chart divided into slices representing proportions.
Contingency Table: Table showing frequencies for combinations of two categorical variables.
Marginal Distribution: Distribution of one variable, ignoring the other.
Conditional Distribution: Distribution of one variable for a fixed value of the other variable.
Independence: Two variables are independent if the conditional distributions are the same across categories.
Example Applications
Comparing proportions of transit riders by city to allocate resources.
Analyzing the relationship between diet type and heart disease to inform public health recommendations.
Additional info: These notes are based on lecture slides for a college-level statistics course, focusing on the graphical and tabular display of categorical data and interpretation of relationships between categorical variables.