Exploring Data with Tables and Graphs: Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Exploring Data with Tables and Graphs

Introduction

Organizing and summarizing data is a fundamental aspect of statistics. Various graphical and tabular methods help reveal patterns, trends, and relationships in data, making it easier to interpret and communicate findings. This chapter covers essential tools for exploring data, including frequency distributions, dotplots, stemplots, time-series graphs, and more.

Frequency Distributions for Organizing and Summarizing Data

Definition and Purpose

Frequency distribution: A table that displays the frequency (count) of various outcomes in a sample.
Helps organize raw data into a more comprehensible format, showing how data values are distributed.
Can be used for both categorical and quantitative data.

Graphs that Enlighten: Quantitative Data

Dotplots

A dotplot is a simple graph for displaying quantitative data. Each data value is represented by a dot placed above a horizontal scale. Dots representing equal values are stacked vertically.

Displays the shape of the data distribution.
Retains original data values, making it possible to reconstruct the data set from the plot.

Example: Prices (in dollars) of 16 DVD players: 210, 219, 214, 197, 224, 219, 199, 199, 208, 209, 215, 199, 212, 212, 219, 210. The dotplot stacks dots above each price value, showing the frequency of each price.

Stemplots (Stem-and-Leaf Plots)

A stemplot (or stem-and-leaf plot) is a graphical method for displaying quantitative data. Each data value is split into a "stem" (all but the final digit) and a "leaf" (the final digit).

Shows the shape of the data distribution.
Retains original data values and sorts them in order.
Useful for small to moderate-sized data sets.

Example: Days-to-maturity for 40 short-term investments:

Stems	Leaves
3	8 6 9
4	7
5	7 1 6 3 5 1 0 5
6	2 4 7 3 6 4 0 9 8 5
7	0 5 1 0 9 8 0
8	5 9 1 7 0 3 6
9	9 5 8

Each row shows the stem (e.g., 6 for 60s) and the leaves (e.g., 2, 4, 7 for 62, 64, 67).

Stemplots can use one or two lines per stem for greater detail.
Can help identify whether a distribution is symmetric, right-skewed, or left-skewed.

Example: Cholesterol levels for 20 patients:

Stems	Leaves
19	9
20	0 2 3 7 8 8 9
21	0 0 0 0 2 3 4 5 7 8 8
22	1

Time-Series Graphs

A time-series graph displays quantitative data collected at different points in time (e.g., monthly, yearly). The horizontal axis represents time, and the vertical axis represents the variable of interest.

Reveals trends, cycles, or patterns over time.
Useful for analyzing changes and forecasting future values.

Example: Law enforcement fatalities per year plotted from 1985 to 2015, showing fluctuations and trends over time.

Graphs that Enlighten: Qualitative/Categorical Data

Bar Graphs

A bar graph uses bars of equal width to show frequencies of categories of categorical (qualitative) data. Bars may be separated by small gaps.

Shows the relative distribution of categorical data.
Makes it easier to compare different categories.

Example: Political party affiliations of students in a statistics class, with each bar representing a party and its frequency.

Pareto Charts

A Pareto chart is a bar graph for categorical data, with bars arranged in descending order of frequency.

Highlights the most important categories.
Draws attention to categories with the highest frequencies.

Example: Types of stolen boats, with the most frequently stolen type on the left.

Pie Charts

A pie chart depicts categorical data as slices of a circle, with each slice proportional to the frequency of the category.

Shows the distribution of categorical data in a familiar, visual format.

Example: Proportion of different types of stolen boats represented as slices of a pie.

Frequency Polygons

Definition and Features

A frequency polygon is a graph that uses line segments connected to points located above class midpoints to represent frequencies. It is similar to a histogram but uses lines instead of bars.

Shows the shape of the data distribution.
Can be used to compare two or more distributions on the same graph.

Relative frequency polygon: Uses proportions or percentages for the vertical scale instead of raw frequencies.

Graphs That Deceive

Nonzero Vertical Axis

Some graphs use a vertical axis that does not start at zero, exaggerating differences between groups.

Always check if the vertical axis starts at zero to avoid misinterpretation.

Pictographs

Pictographs use images or drawings to represent data. They can be misleading because:

Two-dimensional or three-dimensional images can exaggerate differences.
Doubling the sides of a square increases its area by a factor of four; doubling the sides of a cube increases its volume by a factor of eight.

Example: Using stacks of coins to represent budget amounts can distort the true differences between values.

Concluding Thoughts

For small data sets (20 values or fewer), use a table instead of a graph.
Graphs should focus on the true nature of the data, not on distracting design elements.
Do not distort data; construct graphs to reveal the true nature of the data.
Most of the ink in a graph should be used for data, not for decorative elements.

Additional info: These principles are based on Edward Tufte's guidelines for effective data visualization.