Measures of Variation in Descriptive Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Measures of Variation

Introduction to Measures of Variation

Measures of variation are essential in statistics for understanding how data values spread or differ from each other. The three primary measures of variation are range, standard deviation, and variance. These statistics help us interpret the consistency and reliability of data sets.

Range

The range is the simplest measure of variation, calculated as the difference between the maximum and minimum values in a data set.

Formula:
Sensitivity: The range is highly sensitive to extreme values (outliers) and does not reflect the variation among all data values.
Example: For the wait times (50, 25, 75, 35, 50, 25, 30, 50, 45, 25, 20), the range is minutes.

Standard Deviation

The standard deviation measures how much data values deviate from the mean. It is denoted by s for a sample and σ for a population.

Sample Standard Deviation Formula:

Population Standard Deviation Formula:

Properties:
- The standard deviation is always non-negative.
- It is zero only when all data values are identical.
- Larger values indicate greater variation.
- It is sensitive to outliers.
- The units are the same as the original data.
Calculation Steps (Sample):
1. Compute the mean .
2. Subtract the mean from each data value.
3. Square each deviation.
4. Sum all squared deviations.
5. Divide by (sample size minus one).
6. Take the square root of the result.
Example: For the Space Mountain wait times, minutes.

Variance

The variance is the square of the standard deviation and provides a measure of how data values spread around the mean.

Sample Variance:
Population Variance:
Units: The variance is expressed in squared units of the original data.
Properties: Variance is never negative and is zero only when all values are the same. It is not resistant to outliers.

Notation Summary

Symbol	Meaning
s	Sample standard deviation
s^2	Sample variance
σ	Population standard deviation
σ^2	Population variance

Range Rule of Thumb

The range rule of thumb is a simple method for estimating the standard deviation. Most sample values (about 95%) lie within two standard deviations of the mean.

Estimation Formula:
Significant Values: Values more than two standard deviations from the mean are considered significant (either high or low).

Empirical Rule (for Bell-Shaped Distributions)

The empirical rule applies to data sets with approximately bell-shaped (normal) distributions:

About 68% of values fall within 1 standard deviation of the mean.
About 95% of values fall within 2 standard deviations of the mean.
About 99.7% of values fall within 3 standard deviations of the mean.
Example: For IQ scores with mean 100 and standard deviation 15, about 95% of scores are between 70 and 130.

Chebyshev’s Theorem

Chebyshev’s theorem applies to all data sets, regardless of distribution shape. It states that the proportion of values within K standard deviations of the mean is at least for .

For : At least 75% of values are within 2 standard deviations.
For : At least 89% of values are within 3 standard deviations.
Example: For IQ scores (mean 100, standard deviation 15), at least 75% are between 70 and 130, and at least 89% are between 55 and 145.

Biased and Unbiased Estimators

The sample standard deviation (s) is a biased estimator of the population standard deviation (σ), meaning its values do not center around σ. The sample variance (s^2) is an unbiased estimator of the population variance (σ^2), meaning its values tend to center around σ^2.

Why Divide by (n - 1)?

When calculating the sample variance, we divide by n - 1 instead of n because only n - 1 values can vary freely when the mean is fixed. This adjustment (called Bessel's correction) ensures that the sample variance is an unbiased estimator of the population variance.

Using Technology for Descriptive Statistics

Statistical software and calculators can quickly compute descriptive statistics, including measures of center and variation. For example, StatCrunch provides tools for calculating mean, median, standard deviation, and variance efficiently.

Screenshot of StatCrunch interface for descriptive statistics