Back2.3- Measures of Central Tendency and the Shape of Distributions
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Measures of Central Tendency
Introduction
Measures of central tendency are statistical tools used to summarize a set of data by identifying a central point within the data. The three most common measures are the mean, median, and mode. These measures help describe the "middle" or "typical" value in a dataset and are fundamental in data analysis.
Mean
The mean (or average) is the sum of all data values divided by the number of values. It is widely used to represent the central value of a dataset.
Population Mean: The mean of all values in a population is denoted by and calculated as: where is the number of values in the population.
Sample Mean: The mean of a sample is denoted by and calculated as: where is the number of values in the sample.
Properties:
The mean is unique for a given dataset.
It is affected by every value, including extreme values (outliers).
The mean may not always be the most appropriate measure of center, especially for skewed data.
Example: The mean price of six cars:
Median
The median is the middle value in a data set when the values are arranged in order. It divides the data into two equal halves.
Finding the Median:
Arrange the data in ascending order.
If the number of values is odd, the median is the middle value.
If the number of values is even, the median is the average of the two middle values.
Properties:
The median is less affected by extremely high or low values (outliers).
It is useful for skewed distributions.
Example: For the data set 872, 432, 397, 482, 782, 397, the median is .
Mode
The mode is the value that occurs most frequently in a data set. It is useful for categorical data and can be used to identify the most common value.
Properties:
The mode may not be unique; a data set can have more than one mode (bimodal, multimodal) or no mode at all.
The mode is most useful for qualitative (categorical) data.
Example: For the data set 872, 432, 397, 482, 782, 397, the mode is .
Additional info: If no value repeats, the data set has "no mode" (this does not mean the mode is zero).
Choosing the Most Appropriate Measure of Central Tendency
Considerations
While the mean is commonly used, it may not always be the best measure of center, especially for data sets with outliers or skewed distributions. The median or mode may better represent the typical value in such cases.
Example: In a company with salaries , , and for 18 employees, the mean salary is , which does not represent the typical salary. The median or mode may be more appropriate.
Example: For test scores 30, 35, 40, 43, 92, 93, 98, 99, the median is , which better represents the central tendency than the mean due to the presence of high outliers.
The Shape of Distributions
Symmetric Distribution (Normal or Bell-Shaped)
A symmetric distribution has data values evenly distributed around the center. The mean, median, and mode are all equal and located at the center of the distribution.
Example: IQ test scores typically follow a normal distribution.
Uniform Distribution
In a uniform distribution, all values or classes have equal or approximately equal frequencies. There is no distinct peak.
Example: Rolling a fair die produces a uniform distribution.
Skewed Distributions
Skewed Left (Negatively Skewed):
The "tail" of the graph extends to the left.
The mean is less than the median.
Example: Test scores where most students score high, but a few score much lower.
Skewed Right (Positively Skewed):
The "tail" of the graph extends to the right.
The mean is greater than the median.
Median is preferred as the mean is affected by outliers.
Example: Distribution of household incomes, where a few households have very high incomes.
Summary Table: Measures of Central Tendency
Measure | Definition | Formula | Best Use |
|---|---|---|---|
Mean | Sum of all values divided by number of values | Symmetric distributions, quantitative data | |
Median | Middle value when data is ordered | N/A | Skewed distributions, data with outliers |
Mode | Most frequently occurring value | N/A | Categorical data, multimodal distributions |
Summary Table: Shapes of Distributions
Shape | Description | Mean vs. Median | Example |
|---|---|---|---|
Symmetric | Values evenly distributed around center | Mean = Median = Mode | IQ scores |
Uniform | All values have equal frequency | Mean = Median | Rolling a die |
Skewed Left | Tail extends to the left | Mean < Median | Test scores with low outliers |
Skewed Right | Tail extends to the right | Mean > Median | Household incomes |