Measures of Central Tendency in Statistics: Mean, Median, and Mode

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Measures of Central Tendency

Introduction

Measures of central tendency are statistical values that describe the center or typical value of a data set. The three most common measures are mean, median, and mode. Understanding these concepts is fundamental for summarizing and interpreting data in statistics.

Mean

The mean, often called the average, is calculated by summing all data values and dividing by the number of values. It is sensitive to outliers and may not always represent the center of skewed data sets.

Definition: The mean of a data set is the sum of all data entries divided by the number of entries.
Formula:
Example: For the data set {6, 7, 5, 6, 5, 5, 6}, the mean is
Application: Used to find the average score, age, cholesterol level, or other quantitative measures.
Properties:
- All data sets have a mean if the data are quantitative.
- The mean is affected by outliers (extremely high or low values).

Median

The median is the middle value when the data are arranged in order. If the number of data points is even, the median is the average of the two middle values. The median is less sensitive to outliers than the mean.

Definition: The median is the value that lies in the middle of the data set when ordered from least to greatest.
Calculation:
- If n is odd: Median is the middle value.
- If n is even: Median is the average of the two middle values.
Formula (even n):
Example: For the data set {5, 5, 5, 6, 6, 6, 7}, the median is 6.
Application: Used to find the central value in distributions, especially when data are skewed.
Properties:
- Some quantitative data sets do not have medians (e.g., non-numeric data).
- The median is not affected by outliers.

Mode

The mode is the value that occurs most frequently in a data set. A data set may have one mode (unimodal), more than one mode (multimodal), or no mode if all values occur with the same frequency.

Definition: The mode is the data value that occurs with the greatest frequency.
Example: For the data set {5, 5, 6, 6, 6, 7}, the mode is 6.
Application: Useful for categorical data and for identifying the most common value.
Properties:
- Some data sets have no mode, one mode, or multiple modes.
- The mode is not always a good measure of center, especially if it is the smallest or largest value.

Comparing Mean, Median, and Mode

Each measure of central tendency has strengths and weaknesses. The choice depends on the data's distribution and the presence of outliers.

Mean: Best for symmetric distributions without outliers.
Median: Best for skewed distributions or when outliers are present.
Mode: Best for categorical data or to identify the most frequent value.

Table: Comparison of Mean, Median, and Mode

Measure	Definition	Sensitivity to Outliers	Applicability
Mean	Sum of values divided by number of values	High	Quantitative data
Median	Middle value in ordered data	Low	Quantitative data
Mode	Most frequent value	Low	Quantitative or qualitative data

Weighted Mean

The weighted mean is used when data values contribute unequally to the average. Each value is multiplied by its weight, summed, and divided by the total weight.

Formula:
Example: If a student's grades are weighted as follows: Homework (10%), Quiz (10%), Project (15%), Final Exam (35%), and scores are 88, 91, 86, 89, the weighted mean is:

Frequency Distribution Mean

When data are grouped into frequency distributions, the mean can be approximated by multiplying each class midpoint by its frequency, summing these products, and dividing by the total frequency.

Formula: where is the frequency and is the midpoint of class .
Example: For gas mileage classes and frequencies:
Gas Mileage
Frequency
20-29
3
30-39
1
40-49
1
The mean is calculated using the midpoints and frequencies.

Gas Mileage	Frequency
20-29	3
30-39	1
40-49	1

Effect of Outliers on Measures of Central Tendency

Outliers can significantly affect the mean, but have less impact on the median and mode.

Mean: Most likely to be affected by outliers.
Median: Less affected, as it depends only on the middle value(s).
Mode: Least affected, unless the outlier is the most frequent value.
Example: If a data entry error changes a value from 1531 to 1544, the mean increases more than the median.

Special Cases and Limitations

Some data sets may not have a mode (if all values occur equally).
Mean, median, and mode may not always represent the center, especially in skewed or multimodal distributions.
For nominal data (categories without order), only the mode is meaningful.

Summary Table: When Measures Represent the Center

Measure	Represents Center?	When?
Mean	Yes	Symmetric, no outliers
Median	Yes	Skewed, outliers present
Mode	Sometimes	If mode is not an extreme value

Practice Problems and Applications

Calculate mean, median, and mode for various data sets.
Determine which measure best represents the center for a given distribution.
Compute weighted means for grades or financial balances.
Approximate the mean from frequency distributions.
Analyze the effect of outliers on mean and median.

Additional info: These notes expand on the brief quiz and assignment content by providing definitions, formulas, examples, and tables for comparison. The context of weighted mean and frequency distribution mean is inferred from the problems shown. The effect of outliers is illustrated with a data entry error example.