Measures of Central Tendency and Dispersion in Statistics: Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

3.1 Measures of Central Tendency

Objectives

Determine the arithmetic mean of a variable from raw data
Determine the median of a variable from raw data
Explain what it means for a statistic to be resistant
Determine the mode of a variable from raw data

Arithmetic Mean of a Variable from Raw Data

The arithmetic mean is a measure of central tendency calculated by adding all the values of a variable in the data set and dividing by the number of observations. It is commonly referred to as the mean.

Population arithmetic mean (): Computed using all individuals in a population. It is a parameter.
Sample arithmetic mean (): Computed using sample data. It is a statistic.

Formulas:

Population mean:
Sample mean:

Example: For travel times (in minutes) of 7 employees: 23, 36, 23, 18, 5, 26, 43, the population mean is calculated by summing all values and dividing by 7.

Median of a Variable from Raw Data

The median is the value that lies in the middle of the data when arranged in ascending order. It divides the data set into two equal halves.

Steps to find the median:
1. Arrange the data in ascending order.
2. Determine the number of observations, .
3. If is odd, the median is the value at the position.
4. If is even, the median is the mean of the values at the and positions.

Example: For the data set 23, 36, 23, 18, 5, 26, 43 (odd number of observations), arrange and find the middle value. If a new value is added (making it even), take the average of the two middle values.

Resistant Statistics

A resistant statistic is not affected substantially by extreme observations (outliers). The median is resistant, while the mean is not.

Example: Adding a new employee with a travel time of 70 minutes to the previous data set will affect the mean more than the median.

Definition: A numerical summary of data is said to be resistant if extreme values do not affect its value substantially.

Mode of a Variable from Raw Data

The mode is the most frequent observation in the data set. A data set may have no mode, one mode, or more than one mode.

Example: For the travel times 23, 36, 23, 18, 5, 26, 43, the mode is 23 (since it appears twice).

Summary Table: Measures of Central Tendency

Measure of Central Tendency	Computation	Interpretation	When to Use
Mean		Center of gravity	When data are quantitative and frequency distribution is roughly symmetric
Median	Arrange data in ascending order, identify middle value	Divides the bottom 50% from the top 50%	When data are quantitative and frequency distribution is skewed left or right
Mode	Tally to determine most frequent observation	Most frequent characteristic	When data are qualitative or quantitative; especially useful for categorical data

3.2 Measures of Dispersion

Objectives

Determine the range of a variable from raw data
Determine the standard deviation of a variable from raw data
Determine the variance of a variable from raw data
Use the Empirical Rule to describe data that are bell shaped
Use Chebyshev's Inequality to describe any data set

Dispersion

Dispersion is the degree to which the data are spread out. It helps to understand the variability in the data.

Range of a Variable from Raw Data

The range, , is the difference between the largest and smallest data values.

Formula:
Example: For travel times 23, 36, 23, 18, 5, 26, 43, minutes.

Standard Deviation of a Variable from Raw Data

The standard deviation measures the average distance of each data value from the mean. It quantifies the spread of the data.

Population standard deviation ():
Sample standard deviation ():

Example: For the travel times, the population standard deviation is calculated by finding the squared differences from the mean, summing them, dividing by , and taking the square root.

Variance of a Variable from Raw Data

The variance is the square of the standard deviation. It represents the average squared deviation from the mean.

Population variance:
Sample variance:

Empirical Rule and Chebyshev's Inequality

Empirical Rule: For bell-shaped (normal) distributions:

About 68% of data falls within 1 standard deviation of the mean
About 95% within 2 standard deviations
About 99.7% within 3 standard deviations

Chebyshev's Inequality: For any data set (not necessarily normal), at least of the data values lie within standard deviations of the mean, for .

Relationship Between Mean, Median, and Distribution Shape

Distribution Shape	Mean vs Median	Skewness
Symmetric	Mean ≈ Median	No skew
Skewed Left	Mean < Median	Tail on left
Skewed Right	Mean > Median	Tail on right

Additional info: The guidelines for mean, median, and skewness hold well for continuous data, but may not always apply for discrete data.

Summary

Mean: Sensitive to outliers, best for symmetric distributions.
Median: Resistant to outliers, best for skewed distributions.
Mode: Useful for categorical data or to identify the most frequent value.
Range: Simple measure of spread, but sensitive to outliers.
Standard deviation and variance: Quantify spread, used in further statistical analysis.