Skip to main content
Back

Descriptive Measures: Five-Number Summary, Boxplot, and Data Variation

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 3: Descriptive Measures

Introduction

This chapter introduces key descriptive statistics used to summarize and visualize data distributions, focusing on the five-number summary, boxplots, and measures of variation. These tools are essential for understanding the center, spread, and shape of data sets in statistics.

Measures of Variation

Standard Deviation and the 3-Standard-Deviations Rule

Standard deviation quantifies the amount of variation or dispersion in a data set. The 3-standard-deviations rule states that almost all observations in any data set lie within three standard deviations of the mean.

  • Standard deviation (): Measures the average distance of data points from the mean.

  • 3-standard-deviations rule: Most data points fall within .

  • Example: Two dotplots with the same mean () but different standard deviations ( and ) illustrate that a larger standard deviation indicates more variation.

Formula:

Quartiles and Five-Number Summary

Definitions and Calculation Steps

The five-number summary provides a concise description of a data set's distribution using five key values: minimum, first quartile (), median (), third quartile (), and maximum.

  • Order the data set in increasing order.

  • Median (): The middle value of the ordered data set.

  • Divide the data into two halves. If the number of observations is odd, include the median in both halves.

  • First quartile (): Median of the lower half.

  • Third quartile (): Median of the upper half.

  • Interquartile range (IQR):

  • Five-number summary: min, , , , max

Example: For a data set of weekly TV-viewing hours, the ordered values are:

5, 15, 16, 20, 21, 25, 26, 27, 30, 30, 31, 32, 32, 34, 35, 38, 38, 41, 43, 66

Calculated quartiles: , , ,

Boxplots

Construction and Interpretation

Boxplots are graphical representations of the five-number summary, showing the center, spread, and potential outliers in a data set.

  • The box spans from to , with a line at the median ().

  • Whiskers extend to the most extreme data points within the lower and upper limits.

  • Lower limit:

  • Upper limit:

  • Points outside these limits are considered potential outliers.

  • Adjacent values: Most extreme observations within the lower and upper limits.

Example Calculation:

  • Lower limit:

  • Upper limit:

Boxplot Shapes and Data Skewness

Symmetry and Skewness

Boxplots can reveal the skewness of a data distribution:

  • Symmetric: Box and whiskers are balanced; median is centered.

  • Right-skewed: Right whisker is longer; median closer to .

  • Left-skewed: Left whisker is longer; median closer to .

Resistant measures (like median and IQR) are not affected by extreme values, making them useful for skewed data.

Application: Income Distribution Across Continents

Boxplot Visualization of GDP per Capita

Boxplots can be used to compare distributions across groups, such as GDP per capita across continents. The example uses R code and the gapminder dataset to visualize and summarize income data.

  • Boxplots display the spread and center of GDP per capita for each continent.

  • Outliers are easily identified as points outside the whiskers.

  • Summary statistics (min, , median, , max) are calculated for each group.

Continent

Median GDP per Capita

Q1

Q3

Min

Max

Africa

1279

779

2797

277

13291

Americas

6937

4211

12736

1201

42951

Asia

4471

1962

11977

601

39724

Europe

33691

12081

36126

5937

49357

Oceania

32612

23109

33694

23109

34435

Additional info: Table values inferred from the R output and boxplot visualization.

Summary

  • The five-number summary and boxplots are powerful tools for summarizing and visualizing data distributions.

  • Standard deviation and IQR measure variation; median and quartiles describe the center and spread.

  • Boxplots help identify skewness and outliers, and are especially useful for comparing groups.

  • Resistant measures (median, IQR) are preferred for skewed data or data with outliers.

Pearson Logo

Study Prep