BackUnderstanding the Median: Measures of Center in Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Topic: Median
Finding the Median
The median is a measure of center that summarizes a data set by identifying its middle value. Unlike the mean, the median is less affected by extreme values (outliers) and provides a central value that divides the data into two equal halves.
Definition: The median of a data set is the value that lies in the middle when the data are arranged in order from smallest to largest.
Steps to Find the Median:
Sort the data in ascending order.
If the number of data points (n) is odd, the median is the middle value.
If n is even, the median is the average of the two middle values.
Formula:
For odd n:
For even n:
Example: Find the median of the following data sets:
(a) 5, 10, 14, 12, 12
Sorted: 5, 10, 12, 12, 14
n = 5 (odd), median is the 3rd value: 12
(b) 5, 10, 14, 12, 12, 7
Sorted: 5, 7, 10, 12, 12, 14
n = 6 (even), median is average of 3rd and 4th values:
Practice: Find the median of the sample data below:
Ages of students in a college class: 19, 20, 20, 21, 21, 22, 22, 23, 23, 24, 25
Sorted: 19, 20, 20, 21, 21, 22, 22, 23, 23, 24, 25
n = 11 (odd), median is the 6th value: 22
Median from Graphical Data
The median can also be determined from graphical representations such as histograms. By listing the data values and finding the middle value, the median can be identified even when data are presented visually.
Example: College credits per semester (data from histogram): 11, 12, 12, 12, 13, 13, 14, 14, 14, 15, 15, 15
Sorted: 11, 12, 12, 12, 13, 13, 14, 14, 14, 15, 15, 15
n = 12 (even), median is average of 6th and 7th values:
Mean vs. Median
Comparing Measures of Center
Both mean and median are measures of center, but they have distinct properties and are affected differently by the distribution of data and the presence of outliers.
Mean (Average): The sum of all data values divided by the number of values.
Formula:
Best used when data are symmetric and have no outliers.
Pros: Uses all data values; Cons: Sensitive to outliers.
Median: The middle value when data are ordered.
Best used when data are skewed or contain outliers.
Pros: Not affected by outliers; Cons: Does not use all data values.
Comparison Table:
Measure | Best for | Pros | Cons |
|---|---|---|---|
Mean | Symmetric data, no outliers | Uses all data values | Sensitive to outliers |
Median | Skewed data, with outliers | Not affected by outliers | Does not use all data values |
Example: Salaries of Graduates
Given a histogram of graduate salaries, the mean may be higher than the median if there are a few very high salaries (outliers). In such cases, the median better represents the typical salary.
If the data are skewed right (long tail to the right), the mean is greater than the median.
If the data are symmetric, mean and median are approximately equal.
Example: Home Prices
Given the prices (in thousands of US dollars) of 8 homes: 275, 228, 800, 280, 308, 287, 501, 342
Mean:
Median: Arrange data: 228, 275, 280, 287, 308, 342, 501, 800. Median is average of 4th and 5th:
The median is more representative of the sample because the mean is skewed by the high value (800).
Additional info: The notes emphasize the importance of choosing the appropriate measure of center based on the data's distribution and the presence of outliers. The median is especially useful for skewed distributions or when outliers are present, while the mean is preferred for symmetric distributions without outliers.