Chapter 3: Numerically Summarizing Data – Study Guide

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Measures of Central Tendency

Arithmetic Mean

The arithmetic mean is a measure of central tendency that represents the average value of a variable in a data set. It is calculated by summing all values and dividing by the number of observations.

Population Mean (μ): Uses all individuals in a population and is considered a parameter.
Sample Mean (\(\bar{x}\)): Uses a subset (sample) of the population and is considered a statistic.
Formula:

Sample mean formula

Example: Exam scores of 10 students can be used to compute both population and sample means.

Table of student exam scores

To find the sample mean, select a random sample and apply the formula above.

Random sample selection on calculator

Median

The median is the value that lies in the middle of the data when arranged in ascending order. It divides the data into two equal halves.

Steps to Find Median:
Arrange data in ascending order.
Determine the number of observations, n.
If n is odd, median is the value at position .
If n is even, median is the mean of values at positions and .

Median calculation formula

Example: Median length of songs released in the 1970s.

Table of song lengths

Resistance of Statistics

A statistic is resistant if extreme values (outliers) do not affect its value substantially. The median is resistant, while the mean is not.

Example: Comparing mean and median for cell phone call lengths.

Table of cell phone call lengths Dot plot and summary statistics for call lengths

For skewed distributions, the median is a better measure of central tendency.

Comparison of mean and median in different distributions

Example: Birth weights of babies – mean and median are close, indicating a bell-shaped distribution.

Table of birth weights Summary statistics for birth weights Histogram of birth weights with mean and median

Mode

The mode is the most frequent observation in a data set. Data can have no mode, one mode, or multiple modes.

Example: Number of O-ring failures on space shuttle flights.
Mode is 0, as it occurs most frequently.

Comparison Table: Measures of Central Tendency

Measure	Computation	Interpretation	When to Use
Mean	Population: Sample:	Center of Gravity	Quantitative, symmetric distribution
Median	Arrange data, divide in half	Divides bottom 50% from top 50%	Quantitative, skewed distribution
Mode	Tally most frequent observation	Most frequent observation	Qualitative or when mode is desired

Comparison table of central tendency measures

Measures of Dispersion

Range

The range is the difference between the largest and smallest data values.

Formula:
Example: Exam scores: points

Standard Deviation

The standard deviation measures the spread of data values around the mean.

Population Standard Deviation (σ):

Population standard deviation formula

Computational formula:

Computational formula for population standard deviation

Example: Calculating standard deviation for exam scores.

Table of deviations and squared deviations Table of scores and squared scores

Sample Standard Deviation (s):

Sample standard deviation formula

Computational formula:

Computational formula for sample standard deviation

Example: Calculating sample standard deviation for a random sample.

Table of sample deviations and squared deviations Table of sample scores and squared scores

Comparison: Standard deviation is larger for University A (16.1) than for University B (8.4), indicating more dispersion in University A.

Summary statistics for University A and B

Variance

The variance is the square of the standard deviation.

Population Variance:
Sample Variance:
Example: If , then ; if , then

Empirical Rule (Bell-Shaped Distributions)

The Empirical Rule describes the spread of data in a bell-shaped (normal) distribution:

68% of data within 1 standard deviation of the mean
95% within 2 standard deviations
99.7% within 3 standard deviations

Empirical Rule diagram Empirical Rule applied to IQ scores

Chebyshev’s Inequality

Chebyshev’s Inequality applies to any data set, regardless of shape. It states that at least of the data lies within k standard deviations of the mean, for .

Chebyshev's Inequality formula