Skip to main content
Back

Measures of Central Tendency and Dispersion in Statistics: Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

3.1 Measures of Central Tendency

Objectives

  • Determine the arithmetic mean of a variable from raw data

  • Determine the median of a variable from raw data

  • Explain what it means for a statistic to be resistant

  • Determine the mode of a variable from raw data

Arithmetic Mean of a Variable from Raw Data

The arithmetic mean is a measure of central tendency calculated by adding all the values of a variable in the data set and dividing by the number of observations. It is commonly referred to as the mean.

  • Population arithmetic mean (): Computed using all individuals in a population. It is a parameter.

  • Sample arithmetic mean (): Computed using sample data. It is a statistic.

Formulas:

  • Population mean:

  • Sample mean:

Example: For travel times (in minutes) of 7 employees: 23, 36, 23, 18, 5, 26, 43, the population mean is calculated by summing all values and dividing by 7.

Median of a Variable from Raw Data

The median is the value that lies in the middle of the data when arranged in ascending order. It divides the data set into two equal halves.

  • Steps to find the median:

    1. Arrange the data in ascending order.

    2. Determine the number of observations, .

    3. If is odd, the median is the value at the position.

    4. If is even, the median is the mean of the values at the and positions.

Example: For the data set 23, 36, 23, 18, 5, 26, 43 (odd number of observations), arrange and find the middle value. If a new value is added (making it even), take the average of the two middle values.

Resistant Statistics

A resistant statistic is not affected substantially by extreme observations (outliers). The median is resistant, while the mean is not.

  • Example: Adding a new employee with a travel time of 70 minutes to the previous data set will affect the mean more than the median.

Definition: A numerical summary of data is said to be resistant if extreme values do not affect its value substantially.

Mode of a Variable from Raw Data

The mode is the most frequent observation in the data set. A data set may have no mode, one mode, or more than one mode.

  • Example: For the travel times 23, 36, 23, 18, 5, 26, 43, the mode is 23 (since it appears twice).

Summary Table: Measures of Central Tendency

Measure of Central Tendency

Computation

Interpretation

When to Use

Mean

Center of gravity

When data are quantitative and frequency distribution is roughly symmetric

Median

Arrange data in ascending order, identify middle value

Divides the bottom 50% from the top 50%

When data are quantitative and frequency distribution is skewed left or right

Mode

Tally to determine most frequent observation

Most frequent characteristic

When data are qualitative or quantitative; especially useful for categorical data

3.2 Measures of Dispersion

Objectives

  • Determine the range of a variable from raw data

  • Determine the standard deviation of a variable from raw data

  • Determine the variance of a variable from raw data

  • Use the Empirical Rule to describe data that are bell shaped

  • Use Chebyshev's Inequality to describe any data set

Dispersion

Dispersion is the degree to which the data are spread out. It helps to understand the variability in the data.

Range of a Variable from Raw Data

The range, , is the difference between the largest and smallest data values.

  • Formula:

  • Example: For travel times 23, 36, 23, 18, 5, 26, 43, minutes.

Standard Deviation of a Variable from Raw Data

The standard deviation measures the average distance of each data value from the mean. It quantifies the spread of the data.

  • Population standard deviation ():

  • Sample standard deviation ():

Example: For the travel times, the population standard deviation is calculated by finding the squared differences from the mean, summing them, dividing by , and taking the square root.

Variance of a Variable from Raw Data

The variance is the square of the standard deviation. It represents the average squared deviation from the mean.

  • Population variance:

  • Sample variance:

Empirical Rule and Chebyshev's Inequality

Empirical Rule: For bell-shaped (normal) distributions:

  • About 68% of data falls within 1 standard deviation of the mean

  • About 95% within 2 standard deviations

  • About 99.7% within 3 standard deviations

Chebyshev's Inequality: For any data set (not necessarily normal), at least of the data values lie within standard deviations of the mean, for .

Relationship Between Mean, Median, and Distribution Shape

Distribution Shape

Mean vs Median

Skewness

Symmetric

Mean ≈ Median

No skew

Skewed Left

Mean < Median

Tail on left

Skewed Right

Mean > Median

Tail on right

Additional info: The guidelines for mean, median, and skewness hold well for continuous data, but may not always apply for discrete data.

Summary

  • Mean: Sensitive to outliers, best for symmetric distributions.

  • Median: Resistant to outliers, best for skewed distributions.

  • Mode: Useful for categorical data or to identify the most frequent value.

  • Range: Simple measure of spread, but sensitive to outliers.

  • Standard deviation and variance: Quantify spread, used in further statistical analysis.

Pearson Logo

Study Prep