Lesson 3.2

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 2: Exploring Data with Graphs and Numerical Summaries

Section 2.2: Describing Data Using Graphical Summaries

This section introduces graphical methods for summarizing and interpreting data, focusing on both categorical and quantitative variables. Understanding these visual tools is essential for identifying patterns, distributions, and anomalies in statistical data.

Distribution

Definition and Purpose

Distribution: A distribution describes the possible values a variable can take and the frequency or relative frequency of those values.
Graphical summaries and frequency tables are used to visually organize collected data.
Example: Recording the number of push-ups each of five friends can do yields a frequency table showing how often each value occurs.

Graphs for Categorical Data

Pie Charts

Pie charts are used to summarize categorical variables by representing each category as a proportional slice of a circle.

Pie Chart: Each slice's size is proportional to the percentage of observations in that category.
Useful for visualizing the composition of a dataset by category.
Example: COVID-19 deaths in Canada by province, where each province's share is shown as a slice.

Generic pie chart COVID-19 deaths in Canada pie chart

Bar Graphs

Bar graphs display a vertical bar for each category, with the height representing counts (frequencies) or percentages (relative frequencies).

Bar Graph: Easier to compare categories than pie charts.
When categories are ordered by frequency, the bar graph is called a Pareto Chart.
Example: COVID-19 deaths in Canada by province, shown as bars for each province.

Generic bar graph COVID-19 deaths in Canada bar graph

Class Exercise Example

Bar graphs can be used to summarize class standing data, such as the number of students in each year.

Bar graph of class standing

Real-World Applications

Bar graphs and pie charts are commonly used to display real-world data, such as leading causes of death or poverty rates before and after government intervention.

Table of leading causes of death in Canada Bar graph of deaths in Canada by cause Pie chart of deaths in Canada by cause Bar chart of child poverty rates before and after intervention

Graphs for Quantitative Data

Dot Plots

Dot plots are used for small datasets, showing a dot for each observation placed above its value on a number line.

Retains individual data values.
Useful for visualizing the distribution of a quantitative variable.
Example: Sodium content in cereals.

Generic dot plot Dot plot for sodium in cereals

Stem-and-Leaf Plots

Stem-and-leaf plots separate each observation into a stem (first part of the number) and a leaf (last digit), retaining individual values and showing distribution shape.

Stems are listed vertically, leaves are placed horizontally.
Useful for small to moderate datasets.
Example: Sodium content in cereals.

Generic stem-and-leaf plot

Histograms

Histograms use bars to portray the frequencies or relative frequencies of outcomes for a quantitative variable. They are most useful for large datasets.

Data range is divided into intervals of equal width.
Bars are drawn over each interval, with height equal to frequency or percentage.
Example: Sodium content in cereals.

Histogram of TV watching hours Histogram for sodium in cereals

Frequency Table Example

Frequency tables summarize quantitative data by intervals, showing frequency, proportion, and percentage for each interval.

Interval	Frequency	Proportion	Percentage
0 to 39	1	0.05	5%
40 to 79	1	0.05	5%
80 to 119	0	0.00	0%
120 to 159	4	0.20	20%
160 to 199	3	0.15	15%
200 to 239	7	0.35	35%
240 to 279	2	0.10	10%
280 to 319	2	0.10	10%

Histogram skewed to the left

Interpreting Histograms

Key Features: Center, Spread, and Shape

Center: Often measured by the median, where 50% of data lies below and 50% above.
Spread: Indicates how much the data varies.
Shape: Can be symmetric, skewed to the left, or skewed to the right.

Histogram skewed to the right Symmetric histogram

Skewness

Skewed to the left: Left tail is longer than the right tail.
Skewed to the right: Right tail is longer than the left tail.
Symmetric: Both sides are mirror images.

Life span skewed to the left Income skewed to the right

Types of Mounds

Unimodal: One peak in the distribution.
Bimodal: Two peaks in the distribution.

Unimodal and bimodal histograms

Outliers

An outlier is an observation that falls far from the rest of the data. Outliers should be investigated to determine their cause.

Histogram with an outlier

Time Plots

Definition and Use

Time Plot: Used for displaying a time series, plotting each observation against the time it was measured.
Points are usually connected to show trends over time.
Example: Number of people worldwide using the Internet from 1995 to 2001.

Time plot of Internet use

Lesson Summary

Descriptive statistics summarize data using graphical and numerical methods.
Categorical variables: Use pie charts and bar charts.
Quantitative variables: Use histograms, stem-and-leaf plots, dot plots, and box plots.
Key features to identify: shape (symmetrical or skewed), center, spread, and outliers.
Outliers: Extreme values that deviate from the bulk of observations.