Skip to main content
Back

Chapter 3: Numerically Summarizing Data – Study Guide

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Numerically Summarizing Data

Measures of Central Tendency

Measures of central tendency are used to describe the center or typical value of a dataset. The most common measures include the mean, median, and mode.

  • Mean: The arithmetic average of all values in the dataset. It is sensitive to extreme values (outliers).

  • Median: The middle value when the data are ordered from smallest to largest. If the number of observations is even, the median is the average of the two middle values.

  • Mode: The value that appears most frequently in the dataset.

Example: For the dataset {2, 4, 4, 6, 8}, the mean is 4.8, the median is 4, and the mode is 4.

Measures of Dispersion

Dispersion measures describe the spread or variability of the data. Common measures include range, variance, and standard deviation.

  • Range: The difference between the largest and smallest values in the dataset.

  • Variance: The average of the squared differences from the mean. It quantifies the spread of the data.

  • Standard Deviation: The square root of the variance. It is a widely used measure of spread.

Example: For the dataset {2, 4, 4, 6, 8}, the range is 8 - 2 = 6.

Key Formulas

  • Mean:

  • Sample Variance:

  • Sample Standard Deviation:

  • Median: Middle value when data are ordered

  • Range:

Empirical Rule (68-95-99.7 Rule)

The empirical rule applies to data sets with a normal (bell-shaped) distribution. It states:

  • Approximately 68% of data falls within 1 standard deviation of the mean.

  • Approximately 95% of data falls within 2 standard deviations of the mean.

  • Approximately 99.7% of data falls within 3 standard deviations of the mean.

Example: If the mean is 100 and the standard deviation is 15, then about 68% of values are between 85 and 115.

Empirical Rule bell curve diagram

Percentiles and Quartiles

Percentiles and quartiles are measures that divide the data into equal parts. The median is the 50th percentile, the first quartile (Q1) is the 25th percentile, and the third quartile (Q3) is the 75th percentile.

  • Percentile: The value below which a given percentage of observations fall.

  • Quartiles: Q1 (25th percentile), Q2 (50th percentile, median), Q3 (75th percentile).

Example: In a dataset of 100 values, the 25th value (when ordered) is the 25th percentile.

Five-Number Summary

The five-number summary provides a quick overview of the distribution of a dataset. It consists of:

  • Minimum

  • First Quartile (Q1)

  • Median (Q2)

  • Third Quartile (Q3)

  • Maximum

Example: For the dataset {2, 4, 4, 6, 8}, the five-number summary is: 2 (min), 4 (Q1), 4 (median), 6 (Q3), 8 (max).

Comparing Measures

Different measures of central tendency and dispersion are appropriate depending on the shape and characteristics of the data.

  • Mean vs. Median: The mean is affected by outliers, while the median is more robust.

  • Standard Deviation vs. Range: Standard deviation provides more information about the spread than the range.

Summary Table: Measures of Central Tendency and Dispersion

Measure

Definition

Formula

Mean

Arithmetic average

Median

Middle value

-

Mode

Most frequent value

-

Range

Difference between max and min

Variance

Average squared deviation from mean

Standard Deviation

Square root of variance

Additional info: Some explanations and examples were expanded for clarity and completeness based on standard statistics curriculum.

Pearson Logo

Study Prep