BackMeasures of Central Tendency in Statistics: Mean, Median, and Mode
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Measures of Central Tendency
Introduction
Measures of central tendency are statistical values that describe the center or typical value of a data set. The three most common measures are mean, median, and mode. Understanding these concepts is fundamental for summarizing and interpreting data in statistics.
Mean
The mean, often called the average, is calculated by summing all data values and dividing by the number of values. It is sensitive to outliers and may not always represent the center of skewed data sets.
Definition: The mean of a data set is the sum of all data entries divided by the number of entries.
Formula:
Example: For the data set {6, 7, 5, 6, 5, 5, 6}, the mean is
Application: Used to find the average score, age, cholesterol level, or other quantitative measures.
Properties:
All data sets have a mean if the data are quantitative.
The mean is affected by outliers (extremely high or low values).
Median
The median is the middle value when the data are arranged in order. If the number of data points is even, the median is the average of the two middle values. The median is less sensitive to outliers than the mean.
Definition: The median is the value that lies in the middle of the data set when ordered from least to greatest.
Calculation:
If n is odd: Median is the middle value.
If n is even: Median is the average of the two middle values.
Formula (even n):
Example: For the data set {5, 5, 5, 6, 6, 6, 7}, the median is 6.
Application: Used to find the central value in distributions, especially when data are skewed.
Properties:
Some quantitative data sets do not have medians (e.g., non-numeric data).
The median is not affected by outliers.
Mode
The mode is the value that occurs most frequently in a data set. A data set may have one mode (unimodal), more than one mode (multimodal), or no mode if all values occur with the same frequency.
Definition: The mode is the data value that occurs with the greatest frequency.
Example: For the data set {5, 5, 6, 6, 6, 7}, the mode is 6.
Application: Useful for categorical data and for identifying the most common value.
Properties:
Some data sets have no mode, one mode, or multiple modes.
The mode is not always a good measure of center, especially if it is the smallest or largest value.
Comparing Mean, Median, and Mode
Each measure of central tendency has strengths and weaknesses. The choice depends on the data's distribution and the presence of outliers.
Mean: Best for symmetric distributions without outliers.
Median: Best for skewed distributions or when outliers are present.
Mode: Best for categorical data or to identify the most frequent value.
Table: Comparison of Mean, Median, and Mode
Measure | Definition | Sensitivity to Outliers | Applicability |
|---|---|---|---|
Mean | Sum of values divided by number of values | High | Quantitative data |
Median | Middle value in ordered data | Low | Quantitative data |
Mode | Most frequent value | Low | Quantitative or qualitative data |
Weighted Mean
The weighted mean is used when data values contribute unequally to the average. Each value is multiplied by its weight, summed, and divided by the total weight.
Formula:
Example: If a student's grades are weighted as follows: Homework (10%), Quiz (10%), Project (15%), Final Exam (35%), and scores are 88, 91, 86, 89, the weighted mean is:
Frequency Distribution Mean
When data are grouped into frequency distributions, the mean can be approximated by multiplying each class midpoint by its frequency, summing these products, and dividing by the total frequency.
Formula: where is the frequency and is the midpoint of class .
Example: For gas mileage classes and frequencies:
Gas Mileage
Frequency
20-29
3
30-39
1
40-49
1
The mean is calculated using the midpoints and frequencies.
Effect of Outliers on Measures of Central Tendency
Outliers can significantly affect the mean, but have less impact on the median and mode.
Mean: Most likely to be affected by outliers.
Median: Less affected, as it depends only on the middle value(s).
Mode: Least affected, unless the outlier is the most frequent value.
Example: If a data entry error changes a value from 1531 to 1544, the mean increases more than the median.
Special Cases and Limitations
Some data sets may not have a mode (if all values occur equally).
Mean, median, and mode may not always represent the center, especially in skewed or multimodal distributions.
For nominal data (categories without order), only the mode is meaningful.
Summary Table: When Measures Represent the Center
Measure | Represents Center? | When? |
|---|---|---|
Mean | Yes | Symmetric, no outliers |
Median | Yes | Skewed, outliers present |
Mode | Sometimes | If mode is not an extreme value |
Practice Problems and Applications
Calculate mean, median, and mode for various data sets.
Determine which measure best represents the center for a given distribution.
Compute weighted means for grades or financial balances.
Approximate the mean from frequency distributions.
Analyze the effect of outliers on mean and median.
Additional info: These notes expand on the brief quiz and assignment content by providing definitions, formulas, examples, and tables for comparison. The context of weighted mean and frequency distribution mean is inferred from the problems shown. The effect of outliers is illustrated with a data entry error example.