BackMeasures of Central Tendency: Mean, Median, and Mode
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Measures of Central Tendency
Etymology of 'Mean'
The term mean has two distinct etymological origins. In statistics, 'mean' refers to the average value and comes from the Latin medianus ("in the middle"), through Old French meien and Middle English mene. This sense is related to the idea of a "middle value" or central point among a set of numbers. The other sense of 'mean' (unkind/rude) comes from Old English gemæne, meaning "common, shared, public," and later evolved to mean "stingy, ignoble, nasty." These two senses are homonyms with no direct relation.
"Average" sense: Romance/Latin in origin
"Rude" sense: Germanic in origin
Numerically Summarizing Data
Key Characteristics
When summarizing data numerically, three main characteristics are considered:
Shape of the distribution
Center (average)
Spread (variability)
The center and spread are numerical summaries. The center is commonly called the average, and in statistics, three versions are frequently used:
Mean
Median
Mode
Notation
Symbols and Definitions
N: Population size
n: Sample size (total number of data values)
xi: The i-th observation in the data set
Example: For the data set (4, 1, 5, 3, 6, 1, 3), n = 7; x1 = 4, x2 = 1, ..., x7 = 3.
Summation Notation
The uppercase Greek letter Σ (sigma) is used as a summation sign:
Example: For x1 = 4, x2 = 1, ..., x7 = 3:
Mean
Definition and Types
Arithmetic mean: Computed by adding all the values of the variable in the data set and dividing by the number of observations.
Population mean (parameter): Denoted by (pronounced "mew"), calculated using all individuals in a population.
Sample mean (statistic): Denoted by (pronounced "x-bar"), calculated using sample data.
Formulas:
Population mean:
Sample mean:
Remark: Parameters (population) are denoted by Greek letters; statistics (sample) by Latin letters.
Example: Computing Mean
Given travel times (in minutes) for 7 employees: 23, 36, 23, 18, 5, 26, 43, 32, 25
Compute the population mean using all data.
Compute the sample mean for random samples of n = 5 and n = 4 employees.
Median
Definition
The median of a variable is the value that lies in the middle of the data when arranged in ascending order. It is denoted by M for the sample median. The median divides the data so that 50% of the observations are above and 50% below this value.
Steps to Find the Median
Arrange the data in ascending order.
Determine the number of observations, n.
Find the middle observation:
If n is odd: Median is the -th observation.
If n is even: Median is the arithmetic mean of the -th and -th observations.
Example: Computing Median
Given travel times (in minutes) for 7 employees: 23, 36, 23, 18, 5, 26, 43, 32, 25
Compute the population median using all data.
Compute the sample median for random samples of n = 5 and n = 4 employees.
Mean vs Median
Comparing Measures of Central Tendency
When data are skewed or contain outliers, the mean may not represent the typical value. The median is more resistant to extreme values.
Example: Cell Phone Call Lengths
Call | Length (min) |
|---|---|
1 | 7 |
2 | 4 |
3 | 48 |
4 | 3 |
5 | 6 |
Mean = 7.25, Median = 3.5. Since only one call is much longer than the mean, the median better describes the typical call length.
Resistant Numerical Summary
A numerical summary is resistant if extreme values (very large or small) do not affect its value substantially. The median is resistant; the mean is not.
Mode
Definition
The mode of a variable is the most frequent observation in the data set.
A data set can have no mode, one mode, or more than one mode.
Bimodal: Two modes
Multimodal: Three or more modes
Mode is usually not reported for multimodal data as it is not representative of a typical value.
Example: State of Birth of U.S. Vice Presidents
State | Tally | Frequency |
|---|---|---|
Massachusetts | 2 | 2 |
Virginia | 2 | 2 |
New Jersey | 2 | 2 |
New York | 5 | 5 |
Other States | 1 | 1 |
The mode is New York (highest frequency).
Example: Exam Scores
Data: 82, 77, 90, 71, 62, 68, 74, 84, 94, 88. Each value occurs only once; there is no mode.
Distribution Type vs Measure of Central Tendency
Relationship Table
Distribution Shape | Mean vs Median vs Mode |
|---|---|
Perfectly symmetric data set | Mean = Median = Mode |
Rightward skewness (extremely high values) | Mean > Median > Mode |
Leftward skewness (extremely low values) | Mean < Median < Mode |
Graphical Representations
Normal Distribution
In a symmetric, bell-shaped distribution, mean and median coincide at the center.
Right Skewed Distribution
Mean is greater than median; both are greater than mode.
Left Skewed Distribution
Mean is less than median; both are less than mode.
Best Measure of Central Tendency
If the data set is symmetric and has no outliers, use the mean.
If the data set is skewed or contains outliers, use the median.
If the data set is exactly symmetric, all measures coincide:
Examples: Measures of Center
Birth Weights Example
Given birth weights of 50 babies, compute mean () and median (). The distribution is bell-shaped and symmetric, so the mean is the best measure of central tendency.
Comparing Data Sets
Data set (A): 20, 130, 140, 150, 270. Mean = 142, Median = 140. Distribution is approximately symmetric; use mean.
Data set (B): 2, 20, 130, 140, 150, 270, 1003. Mean = 245, Median = 140. Distribution is skewed right due to outliers; use median.
Remark: The median is less sensitive than the mean to extremely large or small measurements (outliers).