BackMeasures of Variation in Descriptive Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Measures of Variation
Introduction to Measures of Variation
Measures of variation are essential in statistics for describing how data values are spread or dispersed around the center of a data set. While measures of central tendency (like mean and median) summarize the center, measures of variation provide insight into the consistency or diversity of the data.
Range: The simplest measure of variation, representing the difference between the maximum and minimum values.
Variance and Standard Deviation: Quantify the average squared and average absolute deviation from the mean, respectively.
Coefficient of Variation: Expresses standard deviation as a percentage of the mean, allowing comparison between different data sets.
Range
The range is the difference between the largest and smallest values in a quantitative data set.
Formula:
Example: If the starting salaries for Corporation A are and , then .
Variance and Standard Deviation
Variance and standard deviation are the most commonly used measures of variation. They indicate how much the data values deviate from the mean.
Deviation: The deviation of an entry is (for population) or (for sample).
Population Variance and Standard Deviation
Population Variance ():
Population Standard Deviation ():
Sample Variance and Standard Deviation
Sample Variance ():
Sample Standard Deviation ():
Key Properties:
Standard deviation is always non-negative.
Standard deviation has the same units as the original data.
Greater spread in data results in a larger standard deviation.
Interpreting Standard Deviation
Standard deviation measures the typical distance of data values from the mean. It is useful for comparing the spread of different data sets, even if their means are similar.
Example: Two corporations may have the same mean salary, but the one with a higher standard deviation has more variability in salaries.
Empirical Rule (68–95–99.7 Rule)
The Empirical Rule applies to data sets with a bell-shaped (normal) distribution. It describes the percentage of data within certain standard deviations from the mean:
About 68% of data lie within 1 standard deviation of the mean.
About 95% of data lie within 2 standard deviations of the mean.
About 99.7% of data lie within 3 standard deviations of the mean.
Example: If the mean height of females is 64.1 inches with a standard deviation of 2.6 inches, then about 68% of heights are between 61.5 and 66.7 inches.
Chebyshev’s Theorem
Chebyshev’s Theorem applies to any data set, regardless of distribution shape. It states that at least of the data values lie within standard deviations of the mean (for ).
For : At least 75% of data lie within 2 standard deviations.
For : At least 88.9% of data lie within 3 standard deviations.
Example: If the mean age is 41.7 years and the standard deviation is 20.85 years, then at least 75% of ages are between 0 and 83.4 years (using ).
Standard Deviation for Grouped Data
When data are grouped into classes, the sample mean and standard deviation can be estimated using class midpoints and frequencies.
Sample Mean: , where is frequency and is class midpoint.
Sample Standard Deviation:
Example: For a frequency distribution of number of children in households, use midpoints to estimate the mean and standard deviation.
Coefficient of Variation (CV)
The coefficient of variation expresses the standard deviation as a percentage of the mean, allowing for comparison between data sets with different units or means.
Population:
Sample:
Example: If a basketball team has a mean height of 75 inches () and a mean weight of 210 pounds (), then and . The weights are more variable than the heights.
Summary Table: Measures of Variation
Measure | Formula | Interpretation |
|---|---|---|
Range | Spread between highest and lowest values | |
Population Variance () | Average squared deviation from the mean (population) | |
Sample Variance () | Average squared deviation from the mean (sample) | |
Population Std. Dev. () | Average deviation from the mean (population) | |
Sample Std. Dev. () | Average deviation from the mean (sample) | |
Coefficient of Variation (CV) | or | Relative variation as a percentage of the mean |
Applications and Importance
Understanding variation is crucial for comparing data sets, identifying outliers, and making informed decisions based on data.
Standard deviation and variance are foundational for inferential statistics, including hypothesis testing and confidence intervals.