BackChapter 3: Describing, Exploring, and Comparing Data – Measures of Variation
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Measures of Variation
Introduction to Measures of Variation
Measures of variation are essential in statistics for understanding how data values spread or differ from each other. The three primary measures of variation are range, standard deviation, and variance. These statistics help us interpret and understand the distribution and consistency of data.
Range: The simplest measure, showing the difference between the largest and smallest values.
Standard Deviation: Indicates how much data values deviate from the mean.
Variance: The square of the standard deviation, representing average squared deviations from the mean.
Rounding Rule for Measures of Variation
When reporting measures of variation, always round the result to one more decimal place than is present in the original data set.
Example: If the data values are reported to one decimal place, report the standard deviation to two decimal places.
Range
Definition and Calculation
The range of a data set is the difference between the maximum and minimum data values. It provides a quick sense of the spread but is sensitive to outliers.
Formula:
Important Property: The range uses only the two extreme values and does not consider the distribution of other data points. It is not resistant to outliers.
Example: Calculating Range
Given wait times (minutes) for Space Mountain: 50, 25, 75, 35, 50, 25, 30, 50, 45, 25, 20
Maximum value: 75
Minimum value: 20
Range: minutes
Standard Deviation
Definition
The standard deviation (denoted by s for a sample) measures the average distance of data values from the mean. It is a key indicator of data variability.
Sample Standard Deviation: Used when data is a sample from a larger population.
Population Standard Deviation: Used when data represents the entire population.
Formula for Sample Standard Deviation
The formula for the sample standard deviation is:
= individual data value
= sample mean
= sample size
Properties of Standard Deviation
Non-negative: Standard deviation is always zero or positive.
Zero Standard Deviation: Occurs only when all data values are identical.
Sensitivity: Standard deviation increases with the presence of outliers.
Units: The units of standard deviation match those of the original data.
Bias: The sample standard deviation is a biased estimator of the population standard deviation.
Example: Calculating Sample Standard Deviation
Given wait times: 50, 25, 75, 35, 50, 25, 30, 50, 45, 25, 20
Compute the mean: minutes
Subtract the mean from each value and square the result.
Sum all squared deviations:
Divide by :
Take the square root: minutes
Shortcut Formula
For computational efficiency, calculators and software may use a shortcut formula for standard deviation.
Range Rule of Thumb
Understanding Standard Deviation
The range rule of thumb provides a simple way to interpret standard deviation. Most (about 95%) sample values lie within two standard deviations of the mean.
Significantly low values:
Significantly high values:
Not significant: Between and
To estimate standard deviation from the range:
Population Standard Deviation
Formula
For a population, the standard deviation is calculated as:
= population mean
= population size
Variance
Definition
The variance measures the average squared deviation from the mean. It is the square of the standard deviation.
Sample variance:
Population variance:
Properties of Variance
Units: Variance units are the square of the original data units.
Sensitivity: Variance is not resistant to outliers.
Non-negative: Variance is always zero or positive.
Unbiased Estimator: Sample variance is an unbiased estimator of population variance .
Why Divide by (n - 1)?
When calculating sample variance, dividing by (instead of ) corrects for bias and ensures that the sample variance centers around the population variance. This adjustment is due to the constraint imposed by the sample mean.
Empirical Rule for Bell-Shaped Distributions
Definition
The empirical rule applies to data sets with approximately bell-shaped (normal) distributions:
About 68% of values fall within 1 standard deviation of the mean.
About 95% of values fall within 2 standard deviations of the mean.
About 99.7% of values fall within 3 standard deviations of the mean.
Example: Empirical Rule
If IQ scores are normally distributed with a mean of 100 and a standard deviation of 15:
Scores between and include about 95% of all scores.
Chebyshev’s Theorem
Definition
Chebyshev’s theorem applies to any data set, regardless of distribution. It states that the proportion of values within standard deviations of the mean is at least , for .
For : At least 75% of values are within 2 standard deviations.
For : At least 89% of values are within 3 standard deviations.
Example: Chebyshev’s Theorem
For IQ scores with mean 100 and standard deviation 15:
At least 75% of scores are between 70 and 130.
At least 89% of scores are between 55 and 145.
Comparing Variation: Coefficient of Variation
Definition
The coefficient of variation (CV) expresses the standard deviation as a percentage of the mean, allowing comparison of variability between different data sets.
Type | Formula |
|---|---|
Sample | |
Population |
Round CV to one decimal place (e.g., 25.3%).
Biased and Unbiased Estimators
Definitions
Biased Estimator: Sample standard deviation is a biased estimator of population standard deviation .
Unbiased Estimator: Sample variance is an unbiased estimator of population variance .
This distinction is important for inferential statistics and making accurate predictions about populations from samples.