BackMeasures of Variation in Descriptive Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 2: Descriptive Statistics
Section 2.4: Measures of Variation
This section explores how to quantify the spread or dispersion of data in a dataset. Understanding variation is essential for interpreting statistical results and comparing different datasets.
Objectives
How to find the range of a data set
How to find the variance and standard deviation of a population and of a sample
How to use the Empirical Rule and Chebychev’s Theorem to interpret standard deviation
How to approximate the sample standard deviation for grouped data
How to use the coefficient of variation to compare variation in different data sets
Range
The range is the simplest measure of variation, representing the difference between the largest and smallest values in a quantitative data set.
Definition: The range is the difference between the maximum and minimum data entries in the set.
Formula:
The data must be quantitative (numerical).
Example: Consider the starting salaries (in thousands of dollars) for Corporation A:
Salary | 51 | 48 | 49 | 55 | 57 | 51 | 54 | 51 | 47 | 52 |
|---|
Ordered data: 47, 48, 49, 51, 51, 51, 52, 54, 55, 57
Range = 57 - 47 = 10 (i.e., $10,000)
Variation
Variation describes how much the data entries differ from each other. Even if two datasets have the same mean, median, and mode, their variation can be very different.
Greater variation means data entries are more spread out from the mean.
Smaller variation means data entries are closer to the mean.
Example: Corporation A and Corporation B may have similar averages, but Corporation B's salaries are more spread out, indicating greater variation.
Deviation, Variance, and Standard Deviation
These are more sophisticated measures of variation, quantifying how far data entries are from the mean.
Deviation: The difference between a data entry and the mean (for population) or (for sample).
Population deviation:
Sample deviation:
Population Variance and Standard Deviation
Population Variance ():
Population Standard Deviation ():
Standard deviation is always non-negative and has the same units as the data.
If , all data entries are identical.
As data entries get farther from the mean, increases.
Sample Variance and Standard Deviation
Sample Variance ():
Sample Standard Deviation ():
Use (degrees of freedom) in the denominator for sample calculations.
Steps to Calculate Variance and Standard Deviation
Find the mean of the data set.
Find the deviation of each entry from the mean.
Square each deviation.
Sum the squared deviations.
Divide by (population) or (sample) for variance.
Take the square root for standard deviation.
Example: Population Standard Deviation
Salary ($1000s) | Deviation () | Squared Deviation |
|---|---|---|
51 | -0.5 | 0.25 |
48 | -3.5 | 12.25 |
49 | -2.5 | 6.25 |
55 | 3.5 | 12.25 |
57 | 5.5 | 30.25 |
51 | -0.5 | 0.25 |
54 | 2.5 | 6.25 |
51 | -0.5 | 0.25 |
47 | -4.5 | 20.25 |
52 | 0.5 | 0.25 |
Sum of squared deviations:
Population variance:
Population standard deviation:
Interpreting Standard Deviation
Standard deviation measures the typical amount by which data entries deviate from the mean. Larger standard deviation indicates more spread out data.
If all entries are the same, standard deviation is zero.
Standard deviation increases as data becomes more spread out.
Empirical Rule (68-95-99.7 Rule)
For data with a symmetric, bell-shaped (normal) distribution:
About 68% of data lie within one standard deviation of the mean.
About 95% within two standard deviations.
About 99.7% within three standard deviations.
Example: If the mean height of women is 64.1 inches and the standard deviation is 2.6 inches, then approximately 68% of women are between 61.5 and 66.7 inches tall.
Chebychev’s Theorem
Chebychev’s Theorem applies to any data set, regardless of distribution shape. It states that the proportion of data within standard deviations () of the mean is at least:
For : At least 75% of data lie within 2 standard deviations.
For : At least 88.9% of data lie within 3 standard deviations.
Example: If the mean age is 38.2 years and standard deviation is 22.6 years, then at least 75% of ages are between 0 and 83.4 years.
Standard Deviation for Grouped Data
When data is presented in frequency distributions, use class midpoints to estimate mean and standard deviation.
Sample standard deviation for frequency distribution:
= class midpoint, = frequency, = total number of entries
Example: For number of children per household, calculate mean and standard deviation using midpoints and frequencies.
Coefficient of Variation (CV)
The coefficient of variation expresses the standard deviation as a percentage of the mean, allowing comparison of variability between datasets with different units or means.
Population CV:
Sample CV:
Example: Heights and weights of a basketball team:
Statistic | Mean | Standard Deviation | CV |
|---|---|---|---|
Height (inches) | 72.8 | 3.3 | 4.5% |
Weight (pounds) | 187.8 | 17.7 | 9.4% |
Weights are more variable than heights in this example.
Additional info: The notes are based on textbook slides and include examples, formulas, and applications relevant for college-level statistics students.