BackMeasures of Central Tendency and Dispersion in Statistics: Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
3.1 Measures of Central Tendency
Objectives
Determine the arithmetic mean of a variable from raw data
Determine the median of a variable from raw data
Explain what it means for a statistic to be resistant
Determine the mode of a variable from raw data
Arithmetic Mean of a Variable from Raw Data
The arithmetic mean is a measure of central tendency calculated by adding all the values of a variable in the data set and dividing by the number of observations. It is commonly referred to as the mean.
Population arithmetic mean (): Computed using all individuals in a population. It is a parameter.
Sample arithmetic mean (): Computed using sample data. It is a statistic.
Formulas:
Population mean:
Sample mean:
Example: For travel times (in minutes) of 7 employees: 23, 36, 23, 18, 5, 26, 43, the population mean is calculated by summing all values and dividing by 7.
Median of a Variable from Raw Data
The median is the value that lies in the middle of the data when arranged in ascending order. It divides the data set into two equal halves.
Steps to find the median:
Arrange the data in ascending order.
Determine the number of observations, .
If is odd, the median is the value at the position.
If is even, the median is the mean of the values at the and positions.
Example: For the data set 23, 36, 23, 18, 5, 26, 43 (odd number of observations), arrange and find the middle value. If a new value is added (making it even), take the average of the two middle values.
Resistant Statistics
A resistant statistic is not affected substantially by extreme observations (outliers). The median is resistant, while the mean is not.
Example: Adding a new employee with a travel time of 70 minutes to the previous data set will affect the mean more than the median.
Definition: A numerical summary of data is said to be resistant if extreme values do not affect its value substantially.
Mode of a Variable from Raw Data
The mode is the most frequent observation in the data set. A data set may have no mode, one mode, or more than one mode.
Example: For the travel times 23, 36, 23, 18, 5, 26, 43, the mode is 23 (since it appears twice).
Summary Table: Measures of Central Tendency
Measure of Central Tendency | Computation | Interpretation | When to Use |
|---|---|---|---|
Mean |
| Center of gravity | When data are quantitative and frequency distribution is roughly symmetric |
Median | Arrange data in ascending order, identify middle value | Divides the bottom 50% from the top 50% | When data are quantitative and frequency distribution is skewed left or right |
Mode | Tally to determine most frequent observation | Most frequent characteristic | When data are qualitative or quantitative; especially useful for categorical data |
3.2 Measures of Dispersion
Objectives
Determine the range of a variable from raw data
Determine the standard deviation of a variable from raw data
Determine the variance of a variable from raw data
Use the Empirical Rule to describe data that are bell shaped
Use Chebyshev's Inequality to describe any data set
Dispersion
Dispersion is the degree to which the data are spread out. It helps to understand the variability in the data.
Range of a Variable from Raw Data
The range, , is the difference between the largest and smallest data values.
Formula:
Example: For travel times 23, 36, 23, 18, 5, 26, 43, minutes.
Standard Deviation of a Variable from Raw Data
The standard deviation measures the average distance of each data value from the mean. It quantifies the spread of the data.
Population standard deviation ():
Sample standard deviation ():
Example: For the travel times, the population standard deviation is calculated by finding the squared differences from the mean, summing them, dividing by , and taking the square root.
Variance of a Variable from Raw Data
The variance is the square of the standard deviation. It represents the average squared deviation from the mean.
Population variance:
Sample variance:
Empirical Rule and Chebyshev's Inequality
Empirical Rule: For bell-shaped (normal) distributions:
About 68% of data falls within 1 standard deviation of the mean
About 95% within 2 standard deviations
About 99.7% within 3 standard deviations
Chebyshev's Inequality: For any data set (not necessarily normal), at least of the data values lie within standard deviations of the mean, for .
Relationship Between Mean, Median, and Distribution Shape
Distribution Shape | Mean vs Median | Skewness |
|---|---|---|
Symmetric | Mean ≈ Median | No skew |
Skewed Left | Mean < Median | Tail on left |
Skewed Right | Mean > Median | Tail on right |
Additional info: The guidelines for mean, median, and skewness hold well for continuous data, but may not always apply for discrete data.
Summary
Mean: Sensitive to outliers, best for symmetric distributions.
Median: Resistant to outliers, best for skewed distributions.
Mode: Useful for categorical data or to identify the most frequent value.
Range: Simple measure of spread, but sensitive to outliers.
Standard deviation and variance: Quantify spread, used in further statistical analysis.