Measures of Variation in Descriptive Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 2: Descriptive Statistics

Section 2.4: Measures of Variation

This section explores how to quantify the spread or dispersion of data in a dataset. Understanding variation is essential for interpreting statistical results and comparing different datasets.

Objectives

How to find the range of a data set
How to find the variance and standard deviation of a population and of a sample
How to use the Empirical Rule and Chebychev’s Theorem to interpret standard deviation
How to approximate the sample standard deviation for grouped data
How to use the coefficient of variation to compare variation in different data sets

Range

The range is the simplest measure of variation, representing the difference between the largest and smallest values in a quantitative data set.

Definition: The range is the difference between the maximum and minimum data entries in the set.
Formula:

The data must be quantitative (numerical).

Example: Consider the starting salaries (in thousands of dollars) for Corporation A:

Salary	51	48	49	55	57	51	54	51	47	52

Ordered data: 47, 48, 49, 51, 51, 51, 52, 54, 55, 57

Range = 57 - 47 = 10 (i.e., $10,000)

Variation

Variation describes how much the data entries differ from each other. Even if two datasets have the same mean, median, and mode, their variation can be very different.

Greater variation means data entries are more spread out from the mean.
Smaller variation means data entries are closer to the mean.

Example: Corporation A and Corporation B may have similar averages, but Corporation B's salaries are more spread out, indicating greater variation.

Deviation, Variance, and Standard Deviation

These are more sophisticated measures of variation, quantifying how far data entries are from the mean.

Deviation: The difference between a data entry and the mean (for population) or (for sample).
Population deviation:
Sample deviation:

Population Variance and Standard Deviation

Population Variance ():

Population Standard Deviation ():

Standard deviation is always non-negative and has the same units as the data.
If , all data entries are identical.
As data entries get farther from the mean, increases.

Sample Variance and Standard Deviation

Sample Variance ():

Sample Standard Deviation ():

Use (degrees of freedom) in the denominator for sample calculations.

Steps to Calculate Variance and Standard Deviation

Find the mean of the data set.
Find the deviation of each entry from the mean.
Square each deviation.
Sum the squared deviations.
Divide by (population) or (sample) for variance.
Take the square root for standard deviation.

Example: Population Standard Deviation

Salary ($1000s)	Deviation ()	Squared Deviation
51	-0.5	0.25
48	-3.5	12.25
49	-2.5	6.25
55	3.5	12.25
57	5.5	30.25
51	-0.5	0.25
54	2.5	6.25
51	-0.5	0.25
47	-4.5	20.25
52	0.5	0.25

Sum of squared deviations:

Population variance:

Population standard deviation:

Interpreting Standard Deviation

Standard deviation measures the typical amount by which data entries deviate from the mean. Larger standard deviation indicates more spread out data.

If all entries are the same, standard deviation is zero.
Standard deviation increases as data becomes more spread out.

Empirical Rule (68-95-99.7 Rule)

For data with a symmetric, bell-shaped (normal) distribution:

About 68% of data lie within one standard deviation of the mean.
About 95% within two standard deviations.
About 99.7% within three standard deviations.

Example: If the mean height of women is 64.1 inches and the standard deviation is 2.6 inches, then approximately 68% of women are between 61.5 and 66.7 inches tall.

Chebychev’s Theorem

Chebychev’s Theorem applies to any data set, regardless of distribution shape. It states that the proportion of data within standard deviations () of the mean is at least:

For : At least 75% of data lie within 2 standard deviations.
For : At least 88.9% of data lie within 3 standard deviations.

Example: If the mean age is 38.2 years and standard deviation is 22.6 years, then at least 75% of ages are between 0 and 83.4 years.

Standard Deviation for Grouped Data

When data is presented in frequency distributions, use class midpoints to estimate mean and standard deviation.

Sample standard deviation for frequency distribution:

= class midpoint, = frequency, = total number of entries

Example: For number of children per household, calculate mean and standard deviation using midpoints and frequencies.

Coefficient of Variation (CV)

The coefficient of variation expresses the standard deviation as a percentage of the mean, allowing comparison of variability between datasets with different units or means.

Population CV:
Sample CV:

Example: Heights and weights of a basketball team:

Statistic	Mean	Standard Deviation	CV
Height (inches)	72.8	3.3	4.5%
Weight (pounds)	187.8	17.7	9.4%

Weights are more variable than heights in this example.

Additional info: The notes are based on textbook slides and include examples, formulas, and applications relevant for college-level statistics students.