Skip to main content
Back

Organizing Quantitative Data: Histograms, Dotplots, and Stem-and-Leaf Displays

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 2: Organizing Data

Introduction

Organizing quantitative data is a fundamental step in statistical analysis. Effective data organization allows for clear visualization, summarization, and interpretation of data distributions. This chapter introduces three primary methods for organizing quantitative data: histograms, dotplots, and stem-and-leaf displays.

Organizing Quantitative Data - Histogram

Definition and Purpose

A histogram is a graphical representation of the distribution of numerical data, where data are grouped into ranges (classes) and the frequency of data within each range is depicted by the height of the bar.

  • Class width: The range of values in each class interval. All classes should have the same width.

  • Frequency histogram: Shows the number of data points in each class.

  • Relative-frequency histogram: Shows the proportion of data points in each class.

Example: Weight of 37 Males (Aged 18-24)

Given a sample of 37 weights, a class width of 20 is used to construct frequency and relative-frequency histograms.

Weight

Frequency

Relative Frequency

120-under 140

3

0.0810811

140-under 160

9

0.2432432

160-under 180

14

0.3783784

180-under 200

7

0.1891892

200-under 220

3

0.0810811

220-under 240

0

0.0000000

240-under 260

0

0.0000000

260-under 280

1

0.0270270

Total

37

1.0000000

  • Class cutpoints label the bars; each observation must belong to one, and only one, class.

  • Frequency and relative-frequency histograms have proportionally the same shape.

Formulas

  • Relative Frequency:

Summary of Histogram Construction

  • The number of classes should be small enough to provide an effective summary but large enough to reveal relevant characteristics of the data.

  • All classes should have the same width.

  • Each observation must belong to one, and only one, class.

Organizing Quantitative Data - Dotplots

Definition and Purpose

A dotplot is a simple graphical display where each data value is represented by a dot placed above its value on a number line. Dotplots are useful for small datasets and for visualizing the distribution and clustering of data points.

  • Each observation is plotted as a dot at an appropriate place above a horizontal axis.

  • Multiple identical values are stacked vertically.

Example: Prices of DVD Players

Given a sample of 16 prices, the dotplot visually displays the distribution of prices.

  • Dotplots are especially useful for identifying clusters, gaps, and outliers in small datasets.

Applications

  • Dotplots are commonly used in exploratory data analysis and for comparing small groups.

Organizing Quantitative Data - Stem-and-Leaf Displays

Definition and Purpose

A stem-and-leaf display (or stemplot) is a method of displaying quantitative data in which each data value is split into a "stem" (all but the rightmost digit) and a "leaf" (the rightmost digit). This method preserves the original data values while showing the distribution.

  • The stem consists of all but the rightmost digit; the leaf is the rightmost digit.

  • Stemplots can use one line per stem or two lines per stem for greater detail.

Example: Cholesterol Levels of 20 Patients

Sample data are organized into a stem-and-leaf diagram:

Stem

Leaf (one line per stem)

19

9

20

0 2 3 7 8 8 9

21

0 0 0 2 3 4 5 7 8 8

22

1

Using two lines per stem, the first line contains leaves 0-4, and the second line contains leaves 5-9:

Stem

Leaf (0-4)

Leaf (5-9)

19

9

20

0 2 3

7 8 8 9

21

0 0 0 2 3 4

5 7 8 8

22

1

Advantages of Stem-and-Leaf Displays

  • Preserve the original data values.

  • Show the shape of the distribution.

  • Allow quick identification of the mode, clusters, and outliers.

Summary of Data Organization Methods

  • Histograms are best for large datasets and provide a clear view of the distribution shape.

  • Dotplots are ideal for small datasets and allow for easy identification of individual values.

  • Stem-and-leaf displays combine the advantages of both, preserving data values and showing distribution.

Additional info: The notes also include brief R code examples for generating these plots, which are commonly used in statistical software for data visualization.

Pearson Logo

Study Prep