Measures of Dispersion: Range, Variance, Standard Deviation, Empirical Rule, and Chebyshev’s Inequality

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Measures of Dispersion

Overview

Measures of dispersion quantify the spread or variability of a data set. They are essential for understanding how data values differ from the center (mean or median) and from each other. The main measures include range, variance, standard deviation, and rules for describing data spread such as the Empirical Rule and Chebyshev’s Inequality.

Range
Standard deviation and Variance
Empirical Rule for bell-shaped (normal) data
Chebyshev’s Inequality for any data set

Comparing Centers and Variation

Data sets may have the same center (mean or median) but different levels of variation. Understanding dispersion helps distinguish between stable and variable data.

Same center, different variation: Data sets with identical means can have very different spreads.
Different centers, same variation: Data sets may be centered at different values but have similar variability.

Range

Definition and Calculation

The range is the simplest measure of variability, defined as the difference between the largest and smallest observations in a data set.

Formula:
Arrange data in ascending order before calculating the range.

Example: Pulse Rates

Patient A: 72, 74, 76 → Range = 76 - 72 = 4
Patient B: 59, 71, 92 → Range = 92 - 59 = 33

The range is not a robust measure because:

It uses only two data points.
It ignores the distribution of the rest of the data.
It is sensitive to outliers.

Example: Student Scores

Student	Score
Michelle	82
Ryanne	77
Bilal	91
Pam	71
Jennifer	62
Dave	94
Joel	64
Sam	84
Justine	70
Juan	88

Range = 94 - 62 = 32

Variance and Standard Deviation

Population Variance

Population variance measures the average squared deviation from the mean for all data points in a population.

Symbol:
Formula:
Alternate formula:

Population Standard Deviation

Standard deviation is the square root of the variance and represents the typical deviation from the mean.

Symbol:
Formula:
Computational formula:

Sample Standard Deviation

For a sample, the sample standard deviation is calculated using in the denominator to provide an unbiased estimate of the population standard deviation.

Symbol:
Formula:
The sample variance is .

Degrees of Freedom

The term degrees of freedom refers to the number of independent values that can vary in the calculation of a statistic. For sample variance, it is because the last value is determined by the requirement that the sum of deviations from the mean equals zero.

Using in the denominator corrects for bias in estimating population variance from a sample.

Why Standard Deviation?

Standard deviation is preferred because it uses all data points and provides a measure of typical deviation from the mean. It is crucial for:

Identifying unusual observations (typically those more than 2 standard deviations from the mean).
Comparing variability between populations: larger standard deviation means greater spread.

Empirical Rule for Normal Distributions

Definition

The Empirical Rule applies to data sets that are approximately bell-shaped (normal distribution). It states:

About 68% of data lie within 1 standard deviation of the mean.
About 95% lie within 2 standard deviations.
About 99.7% lie within 3 standard deviations.

Graphically, the normal curve is divided into regions by standard deviations from the mean, with specific percentages in each region.

Example: Serum HDL Cholesterol

Data: 54 patients' HDL cholesterol levels
Mean (): 57.4
Standard deviation (): 11.7

According to the Empirical Rule:

99.7% of patients have HDL within 3 standard deviations of the mean.
81.5% (13.5% + 34% + 34%) have HDL between 34.0 and 69.1.
Actual percentage: 45 out of 54 patients (83.3%) have HDL between 34.0 and 69.1.

Chebyshev’s Inequality

Definition

Chebyshev’s Inequality provides a minimum proportion of data within standard deviations of the mean for any data set, regardless of distribution shape.

At least of data lie within and for .

Chebyshev’s Table

At least	within	k
100(1 - 1/k^2)%	(, )	k
0%	(, )	k=1
75%	(, )	k=2
88.9%	(, )	k=3
93.75%	(, )	k=4
55.6%	(, )	k=1.5

Examples Using Chebyshev’s Inequality

For , at least 88.9% of data lie within 3 standard deviations of the mean.
Actual percentage from HDL example: 52/54 ≈ 96% within 3 SD.
For a statistics exam with mean 80 and SD 15:
- Between 50 and 110 (): at least 75% of scores.
- To contain at least 60%: solve ; ; interval is (56.3, 103.7).
- To guarantee 84%: ; interval is (42.5, 117.5).

Application: IQ Scores and Empirical Rule

Example: IQ Distribution

Mean (): 100
Standard deviation (): 15
Bell-shaped distribution

Between 70 and 130 (): ≈ 95% of people
Less than 70 or greater than 130: ≈ 5% of people
Greater than 130: ≈ 2.5% of people (half of 5%, due to symmetry)

Summary: Measures of dispersion are crucial for understanding the variability in data. Range, variance, and standard deviation provide quantitative measures, while the Empirical Rule and Chebyshev’s Inequality help describe the spread of data in both normal and arbitrary distributions.