Skip to main content
Back

Chapter 3: Numerically Summarizing Data – Study Guide

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Measures of Central Tendency

Arithmetic Mean

The arithmetic mean is a measure of central tendency that represents the average value of a variable in a data set. It is calculated by summing all values and dividing by the number of observations.

  • Population Mean (μ): Uses all individuals in a population and is considered a parameter.

  • Sample Mean (\(\bar{x}\)): Uses a subset (sample) of the population and is considered a statistic.

  • Formula:

Sample mean formula

  • Example: Exam scores of 10 students can be used to compute both population and sample means.

Table of student exam scores

To find the sample mean, select a random sample and apply the formula above.

Random sample selection on calculator

Median

The median is the value that lies in the middle of the data when arranged in ascending order. It divides the data into two equal halves.

  • Steps to Find Median:

  • Arrange data in ascending order.

  • Determine the number of observations, n.

  • If n is odd, median is the value at position .

  • If n is even, median is the mean of values at positions and .

Median calculation formula

  • Example: Median length of songs released in the 1970s.

Table of song lengths

Resistance of Statistics

A statistic is resistant if extreme values (outliers) do not affect its value substantially. The median is resistant, while the mean is not.

  • Example: Comparing mean and median for cell phone call lengths.

Table of cell phone call lengthsDot plot and summary statistics for call lengths

  • For skewed distributions, the median is a better measure of central tendency.

Comparison of mean and median in different distributions

  • Example: Birth weights of babies – mean and median are close, indicating a bell-shaped distribution.

Table of birth weightsSummary statistics for birth weightsHistogram of birth weights with mean and median

Mode

The mode is the most frequent observation in a data set. Data can have no mode, one mode, or multiple modes.

  • Example: Number of O-ring failures on space shuttle flights.

  • Mode is 0, as it occurs most frequently.

Comparison Table: Measures of Central Tendency

Measure

Computation

Interpretation

When to Use

Mean

Population: Sample:

Center of Gravity

Quantitative, symmetric distribution

Median

Arrange data, divide in half

Divides bottom 50% from top 50%

Quantitative, skewed distribution

Mode

Tally most frequent observation

Most frequent observation

Qualitative or when mode is desired

Comparison table of central tendency measures

Measures of Dispersion

Range

The range is the difference between the largest and smallest data values.

  • Formula:

  • Example: Exam scores: points

Standard Deviation

The standard deviation measures the spread of data values around the mean.

  • Population Standard Deviation (σ):

Population standard deviation formula

Computational formula:

Computational formula for population standard deviation

  • Example: Calculating standard deviation for exam scores.

Table of deviations and squared deviationsTable of scores and squared scores

  • Sample Standard Deviation (s):

Sample standard deviation formula

Computational formula:

Computational formula for sample standard deviation

  • Example: Calculating sample standard deviation for a random sample.

Table of sample deviations and squared deviationsTable of sample scores and squared scores

  • Comparison: Standard deviation is larger for University A (16.1) than for University B (8.4), indicating more dispersion in University A.

Summary statistics for University A and B

Variance

The variance is the square of the standard deviation.

  • Population Variance:

  • Sample Variance:

  • Example: If , then ; if , then

Empirical Rule (Bell-Shaped Distributions)

The Empirical Rule describes the spread of data in a bell-shaped (normal) distribution:

  • 68% of data within 1 standard deviation of the mean

  • 95% within 2 standard deviations

  • 99.7% within 3 standard deviations

Empirical Rule diagramEmpirical Rule applied to IQ scores

Chebyshev’s Inequality

Chebyshev’s Inequality applies to any data set, regardless of shape. It states that at least of the data lies within k standard deviations of the mean, for .

Chebyshev's Inequality formula

Grouped Data: Central Tendency and Dispersion

Mean from Grouped Data

When only grouped data (frequency distributions) are available, the mean can be approximated using class midpoints and frequencies.

Population and sample mean formulas for grouped data

Standard Deviation from Grouped Data

Standard deviation can also be approximated from grouped data using midpoints and frequencies.

Population and sample standard deviation formulas for grouped data

Computational formula:

Computational formula for grouped data standard deviation

  • Example: Parking and camera violation fines in NYC.

Table of fines, midpoints, frequencies, and deviations

Measures of Position and Outliers

z-Scores

A z-score measures how many standard deviations a data value is from the mean.

  • Population z-score:

  • Sample z-score:

Population and sample z-score formulas

Percentiles

The kth percentile is the value below which k percent of the data falls.

Percentile diagram

Quartiles

Quartiles divide data into four equal parts:

  • Q1: 25th percentile

  • Q2: 50th percentile (median)

  • Q3: 75th percentile

Quartile diagram

Interquartile Range (IQR)

The interquartile range is the range of the middle 50% of the data:

Outliers

To check for outliers:

  • Compute lower fence:

  • Compute upper fence:

  • Values outside these fences are outliers.

Five-Number Summary and Boxplots

Five-Number Summary

The five-number summary consists of: minimum, Q1, median, Q3, maximum.

Boxplots

Boxplots visually display the five-number summary and identify outliers.

  • Draw a box from Q1 to Q3, with a line at the median.

  • Whiskers extend to the smallest and largest values within the fences.

  • Outliers are marked with asterisks.

Boxplot example

  • Boxplots and quartiles can be used to describe the shape of a distribution.

Boxplots for different distribution shapes

Pearson Logo

Study Prep