BackMeasures of Variation in Descriptive Statistics: Range, Standard Deviation, and Variance
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Describing, Exploring, and Comparing Data
Introduction
In statistics, understanding how data values vary is essential for interpreting and comparing datasets. This section focuses on three fundamental measures of variation: range, standard deviation, and variance. These measures help quantify the spread or dispersion of data values around a central value, such as the mean.
Measures of Variation
Range
The range is the simplest measure of variation, representing the difference between the largest and smallest values in a dataset.
Definition: The range of a set of data values is the difference between the maximum and minimum data values.
Formula:
Properties:
The range uses only the maximum and minimum values, making it highly sensitive to extreme values (outliers).
The range is not resistant to outliers and does not reflect the variation among all data values.
Example: For the wait times (in minutes) for Space Mountain: 50, 25, 75, 35, 50, 25, 30, 50, 45, 25, 20 minutes
Standard Deviation
The standard deviation measures how much data values deviate from the mean. It is a more comprehensive measure of variation than the range, as it considers all data values.
Definition: The standard deviation of a set of sample values, denoted by s, quantifies the average distance of data values from the mean.
Types:
Sample standard deviation (s)
Population standard deviation (σ)
Formula for Sample Standard Deviation:
Shortcut Formula for Sample Standard Deviation:
Properties:
The standard deviation is always non-negative and is zero only when all data values are identical.
Larger values of s indicate greater variation.
The standard deviation is sensitive to outliers.
The units of standard deviation are the same as the original data values.
The sample standard deviation s is a biased estimator of the population standard deviation σ.
Example: For the Space Mountain wait times:
Compute the mean: minutes
Subtract the mean from each data value and square the result.
Sum all squared deviations:
Divide by :
Take the square root: minutes
Variance
The variance is a measure of variation equal to the square of the standard deviation. It is less commonly used for direct interpretation due to its squared units but is fundamental in statistical theory.
Definition: The variance quantifies the average squared deviation from the mean.
Formulas:
Sample variance:
Population variance:
Properties:
The units of variance are the squares of the units of the original data values.
Variance is sensitive to outliers and is not resistant.
Variance is always non-negative and is zero only when all data values are identical.
The sample variance is an unbiased estimator of the population variance .
Summary Table: Measures of Variation
Measure | Symbol | Formula | Units |
|---|---|---|---|
Sample Standard Deviation | s | Same as data | |
Sample Variance | Squared units | ||
Population Standard Deviation | Same as data | ||
Population Variance | Squared units |
Additional Rules and Concepts
Round-off Rule for Measures of Variation
When rounding the value of a measure of variation, carry one more decimal place than is present in the original set of data.
Range Rule of Thumb
Most (about 95%) sample values lie within 2 standard deviations of the mean.
Significantly low values are less than ; significantly high values are greater than .
To estimate the standard deviation from the range:
Empirical Rule (for Bell-Shaped Distributions)
About 68% of all values fall within 1 standard deviation of the mean.
About 95% of all values fall within 2 standard deviations of the mean.
About 99.7% of all values fall within 3 standard deviations of the mean.
Chebyshev's Theorem
For any dataset (regardless of distribution), the proportion of values within standard deviations of the mean is at least , where .
For : At least 75% of values are within 2 standard deviations.
For : At least 89% of values are within 3 standard deviations.
Coefficient of Variation (CV)
The coefficient of variation expresses the standard deviation as a percentage of the mean, allowing comparison of variation between datasets with different units or means.
Sample CV:
Population CV:
Round the coefficient of variation to one decimal place (e.g., 25.3%).
Biased and Unbiased Estimators
The sample standard deviation s is a biased estimator of the population standard deviation σ.
The sample variance is an unbiased estimator of the population variance .
Why Divide by (n-1) in Sample Variance?
With a given mean, only values can be freely assigned; the last value is determined by the mean.
Dividing by ensures that sample variances tend to center around the population variance (unbiased estimation).
Dividing by would systematically underestimate the population variance.
*Additional info: The notes above expand on the brief points in the slides, providing full definitions, formulas, and academic context for each concept. The summary table and explanations are inferred from standard statistics curriculum and the provided textbook slides.*