BackMeasures of Dispersion: Range, Variance, Standard Deviation, Empirical Rule, and Chebyshev’s Inequality
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Measures of Dispersion
Overview
Measures of dispersion quantify the spread or variability of a data set. They are essential for understanding how data values differ from the center (mean or median) and from each other. The main measures include range, variance, standard deviation, and rules for describing data spread such as the Empirical Rule and Chebyshev’s Inequality.
Range
Standard deviation and Variance
Empirical Rule for bell-shaped (normal) data
Chebyshev’s Inequality for any data set
Comparing Centers and Variation
Data sets may have the same center (mean or median) but different levels of variation. Understanding dispersion helps distinguish between stable and variable data.
Same center, different variation: Data sets with identical means can have very different spreads.
Different centers, same variation: Data sets may be centered at different values but have similar variability.
Range
Definition and Calculation
The range is the simplest measure of variability, defined as the difference between the largest and smallest observations in a data set.
Formula:
Arrange data in ascending order before calculating the range.
Example: Pulse Rates
Patient A: 72, 74, 76 → Range = 76 - 72 = 4
Patient B: 59, 71, 92 → Range = 92 - 59 = 33
The range is not a robust measure because:
It uses only two data points.
It ignores the distribution of the rest of the data.
It is sensitive to outliers.
Example: Student Scores
Student | Score |
|---|---|
Michelle | 82 |
Ryanne | 77 |
Bilal | 91 |
Pam | 71 |
Jennifer | 62 |
Dave | 94 |
Joel | 64 |
Sam | 84 |
Justine | 70 |
Juan | 88 |
Range = 94 - 62 = 32
Variance and Standard Deviation
Population Variance
Population variance measures the average squared deviation from the mean for all data points in a population.
Symbol:
Formula:
Alternate formula:
Population Standard Deviation
Standard deviation is the square root of the variance and represents the typical deviation from the mean.
Symbol:
Formula:
Computational formula:
Sample Standard Deviation
For a sample, the sample standard deviation is calculated using in the denominator to provide an unbiased estimate of the population standard deviation.
Symbol:
Formula:
The sample variance is .
Degrees of Freedom
The term degrees of freedom refers to the number of independent values that can vary in the calculation of a statistic. For sample variance, it is because the last value is determined by the requirement that the sum of deviations from the mean equals zero.
Using in the denominator corrects for bias in estimating population variance from a sample.
Why Standard Deviation?
Standard deviation is preferred because it uses all data points and provides a measure of typical deviation from the mean. It is crucial for:
Identifying unusual observations (typically those more than 2 standard deviations from the mean).
Comparing variability between populations: larger standard deviation means greater spread.
Empirical Rule for Normal Distributions
Definition
The Empirical Rule applies to data sets that are approximately bell-shaped (normal distribution). It states:
About 68% of data lie within 1 standard deviation of the mean.
About 95% lie within 2 standard deviations.
About 99.7% lie within 3 standard deviations.
Graphically, the normal curve is divided into regions by standard deviations from the mean, with specific percentages in each region.
Example: Serum HDL Cholesterol
Data: 54 patients' HDL cholesterol levels
Mean (): 57.4
Standard deviation (): 11.7
According to the Empirical Rule:
99.7% of patients have HDL within 3 standard deviations of the mean.
81.5% (13.5% + 34% + 34%) have HDL between 34.0 and 69.1.
Actual percentage: 45 out of 54 patients (83.3%) have HDL between 34.0 and 69.1.
Chebyshev’s Inequality
Definition
Chebyshev’s Inequality provides a minimum proportion of data within standard deviations of the mean for any data set, regardless of distribution shape.
At least of data lie within and for .
Chebyshev’s Table
At least | within | k |
|---|---|---|
100(1 - 1/k^2)% | (, ) | k |
0% | (, ) | k=1 |
75% | (, ) | k=2 |
88.9% | (, ) | k=3 |
93.75% | (, ) | k=4 |
55.6% | (, ) | k=1.5 |
Examples Using Chebyshev’s Inequality
For , at least 88.9% of data lie within 3 standard deviations of the mean.
Actual percentage from HDL example: 52/54 ≈ 96% within 3 SD.
For a statistics exam with mean 80 and SD 15:
Between 50 and 110 (): at least 75% of scores.
To contain at least 60%: solve ; ; interval is (56.3, 103.7).
To guarantee 84%: ; interval is (42.5, 117.5).
Application: IQ Scores and Empirical Rule
Example: IQ Distribution
Mean (): 100
Standard deviation (): 15
Bell-shaped distribution
Between 70 and 130 (): ≈ 95% of people
Less than 70 or greater than 130: ≈ 5% of people
Greater than 130: ≈ 2.5% of people (half of 5%, due to symmetry)
Summary: Measures of dispersion are crucial for understanding the variability in data. Range, variance, and standard deviation provide quantitative measures, while the Empirical Rule and Chebyshev’s Inequality help describe the spread of data in both normal and arbitrary distributions.