BackMeasures of Central Tendency and Dispersion (Sections 3.1 & 3.2)
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Measures of Central Tendency
Arithmetic Mean
The arithmetic mean is commonly referred to as the average and is a fundamental measure of central tendency for quantitative data.
Definition: The mean is calculated by summing all values in a data set and dividing by the number of observations.
Population Mean (m): Uses all individuals in a population.
Sample Mean (): Uses data from a sample.
Formula:
Population mean:
Sample mean:
Example: For the data set [7, 4, 1, 4, 3, 48, 5, 3, 6], the mean is calculated by adding all values and dividing by the total number of calls.
Median
The median is the middle value in an ordered data set and is another important measure of central tendency for quantitative variables.
Definition: The median splits the data into two equal halves.
Data must be ordered from smallest to largest before finding the median.
Odd number of data points: The median is the value at position .
Even number of data points: The median is the average of the two middle values.
Example (Odd): For [1, 3, 3, 4, 6, 6, 6, 7, 8], the median is the 5th value, which is 6.
Example (Even): For [3, 5, 6, 8, 10, 11, 11, 12], the median is th position, so average of 8 and 10: .
Mode
The mode is the value that appears most frequently in a data set and can be used for both quantitative and qualitative variables.
Definition: The mode is the most frequent observation.
There may be no mode, one mode, or multiple modes in a data set.
Example: For [0, 0, 1, 2, 1, 1, 2, 3, 4, 4, 0, 0], the mode is 0.
Example (Qualitative): For [head, head, shoulder, neck, head], the mode is 'head'.
Comparing Mean, Median, and Mode
Mean: Center of gravity; best for symmetric quantitative data.
Median: Splits data into halves; best for highly skewed quantitative data.
Mode: Most frequent value; useful for qualitative data.
Resistant Measures
Definition of Resistance
A measure is resistant if it is not substantially affected by extreme values (outliers).
The mean is not resistant; it is pulled in the direction of outliers.
The median is resistant; it is less affected by extreme values.
Visual Comparison: In left-skewed or right-skewed distributions, the mean is pulled toward the tail, while the median remains closer to the center.
Measures of Dispersion
Range
The range measures the spread of the data by subtracting the smallest value from the largest value.
Formula:
Only uses two values; not resistant to outliers.
Example: For [6, 1, 2, 6, 11, 7, 3, 3], range is .
Non-resistance Example: If 6 is mistakenly recorded as 6000, range becomes .
Standard Deviation
The standard deviation quantifies the average distance of each data point from the mean, providing a measure of data spread.
Formula (Sample):
Based on the mean; not resistant to outliers.
Calculation is time-consuming by hand; calculators or software are recommended.
Example: Calculate the standard deviation for Yolanda's phone call lengths. If an outlier is changed (e.g., 48 to 5 or 148), the standard deviation changes significantly.
Summary Table
Measure | Definition | Resistant? | Best For |
|---|---|---|---|
Mean | Average value | No | Symmetric quantitative data |
Median | Middle value | Yes | Skewed quantitative data |
Mode | Most frequent value | Yes | Qualitative data |
Range | Max minus Min | No | Quick spread estimate |
Standard Deviation | Average deviation from the mean | No | Quantitative data spread |
Applications and Examples
Yolanda's Cell Phone Call Lengths
Given a sample of call lengths, students are asked to:
Calculate the mean, median, and mode.
Assess which measure best describes the typical call length, especially when outliers are present.
Calculate the range and standard deviation, and observe how these measures change when an outlier is modified.
Row | Call Lengths |
|---|---|
1 | 7, 4, 1 |
2 | 4, 3, 48 |
3 | 5, 3, 6 |
Additional info: Students should use calculators or statistical software to compute standard deviation for larger data sets.