Chapter 3: Describing, Exploring, and Comparing Data – Measures of Variation

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Measures of Variation

Introduction to Measures of Variation

Measures of variation are essential in statistics for understanding how data values spread or differ from each other. The three primary measures of variation are range, standard deviation, and variance. These statistics help us interpret and understand the distribution and consistency of data.

Range: The simplest measure, showing the difference between the largest and smallest values.
Standard Deviation: Indicates how much data values deviate from the mean.
Variance: The square of the standard deviation, representing average squared deviations from the mean.

Rounding Rule for Measures of Variation

When reporting measures of variation, always round the result to one more decimal place than is present in the original data set.

Example: If the data values are reported to one decimal place, report the standard deviation to two decimal places.

Range

Definition and Calculation

The range of a data set is the difference between the maximum and minimum data values. It provides a quick sense of the spread but is sensitive to outliers.

Formula:

Important Property: The range uses only the two extreme values and does not consider the distribution of other data points. It is not resistant to outliers.

Example: Calculating Range

Given wait times (minutes) for Space Mountain: 50, 25, 75, 35, 50, 25, 30, 50, 45, 25, 20

Maximum value: 75
Minimum value: 20
Range: minutes

Standard Deviation

Definition

The standard deviation (denoted by s for a sample) measures the average distance of data values from the mean. It is a key indicator of data variability.

Sample Standard Deviation: Used when data is a sample from a larger population.
Population Standard Deviation: Used when data represents the entire population.

Formula for Sample Standard Deviation

The formula for the sample standard deviation is:

= individual data value
= sample mean
= sample size

Properties of Standard Deviation

Non-negative: Standard deviation is always zero or positive.
Zero Standard Deviation: Occurs only when all data values are identical.
Sensitivity: Standard deviation increases with the presence of outliers.
Units: The units of standard deviation match those of the original data.
Bias: The sample standard deviation is a biased estimator of the population standard deviation.

Example: Calculating Sample Standard Deviation

Given wait times: 50, 25, 75, 35, 50, 25, 30, 50, 45, 25, 20

Compute the mean: minutes
Subtract the mean from each value and square the result.
Sum all squared deviations:
Divide by :
Take the square root: minutes

Shortcut Formula

For computational efficiency, calculators and software may use a shortcut formula for standard deviation.

Range Rule of Thumb

Understanding Standard Deviation

The range rule of thumb provides a simple way to interpret standard deviation. Most (about 95%) sample values lie within two standard deviations of the mean.

Significantly low values:
Significantly high values:
Not significant: Between and

To estimate standard deviation from the range:

Population Standard Deviation

Formula

For a population, the standard deviation is calculated as:

= population mean
= population size

Variance

Definition

The variance measures the average squared deviation from the mean. It is the square of the standard deviation.

Sample variance:
Population variance:

Properties of Variance

Units: Variance units are the square of the original data units.
Sensitivity: Variance is not resistant to outliers.
Non-negative: Variance is always zero or positive.
Unbiased Estimator: Sample variance is an unbiased estimator of population variance .

Why Divide by (n - 1)?

When calculating sample variance, dividing by (instead of ) corrects for bias and ensures that the sample variance centers around the population variance. This adjustment is due to the constraint imposed by the sample mean.

Empirical Rule for Bell-Shaped Distributions

Definition

The empirical rule applies to data sets with approximately bell-shaped (normal) distributions:

About 68% of values fall within 1 standard deviation of the mean.
About 95% of values fall within 2 standard deviations of the mean.
About 99.7% of values fall within 3 standard deviations of the mean.

Example: Empirical Rule

If IQ scores are normally distributed with a mean of 100 and a standard deviation of 15:

Scores between and include about 95% of all scores.

Chebyshev’s Theorem

Definition

Chebyshev’s theorem applies to any data set, regardless of distribution. It states that the proportion of values within standard deviations of the mean is at least , for .

For : At least 75% of values are within 2 standard deviations.
For : At least 89% of values are within 3 standard deviations.

Example: Chebyshev’s Theorem

For IQ scores with mean 100 and standard deviation 15:

At least 75% of scores are between 70 and 130.
At least 89% of scores are between 55 and 145.

Comparing Variation: Coefficient of Variation

Definition

The coefficient of variation (CV) expresses the standard deviation as a percentage of the mean, allowing comparison of variability between different data sets.

Type	Formula
Sample
Population

Round CV to one decimal place (e.g., 25.3%).

Biased and Unbiased Estimators

Definitions

Biased Estimator: Sample standard deviation is a biased estimator of population standard deviation .
Unbiased Estimator: Sample variance is an unbiased estimator of population variance .

This distinction is important for inferential statistics and making accurate predictions about populations from samples.