Chapter 3: Numerical Descriptive Measures in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

3.1 Measures of Central Tendency for Ungrouped Data

Definition and Importance

Measures of central tendency are statistical values that describe the center or typical value of a data set. The three main measures are the mean, median, and mode. These measures help summarize and understand large data sets by identifying a representative value.

Section heading: Measures of Central Tendency for Ungrouped Data

Mean

The mean (or average) is calculated by dividing the sum of all values by the number of values in the data set. It is sensitive to every value, including outliers.

Population Mean:
Sample Mean:

Where is the sum of all values, is the population size, is the sample size, is the population mean, and is the sample mean.

Example: Table 3.1 shows cash donations by eight U.S. companies in 2010. The mean donation is calculated by summing all donations and dividing by 8.

Table of cash donations by eight U.S. companies

Median

The median is the value of the middle term in a data set arranged in increasing order. If the number of values is odd, the median is the middle value; if even, it is the average of the two middle values.

Step 1: Rank the data in increasing order.
Step 2: Identify the middle value(s).

Example: Table 3.2 lists the number of homes foreclosed in seven states in 2010. The median is the fourth value when the data is ordered.

Table of number of homes foreclosed in seven states Ordered data showing the median value

Example: Table 3.3 shows the total compensation of 12 highest-paid CEOs in 2010. The median is the average of the 6th and 7th values in the ordered list.

Table of CEO compensations Ordered CEO compensation data showing the median

Mode

The mode is the value that occurs most frequently in a data set. A data set may have no mode, one mode (unimodal), two modes (bimodal), or more than two modes (multimodal). The mode can be used for both quantitative and qualitative data.

Unimodal: One mode
Bimodal: Two modes
Multimodal: More than two modes

Relationships Among the Mean, Median, and Mode

The relationship among mean, median, and mode depends on the shape of the data distribution:

Symmetric Distribution: Mean = Median = Mode (center of the distribution)

Symmetric histogram: mean = median = mode

Right-Skewed Distribution: Mean > Median > Mode (mean is pulled right by outliers)

Right-skewed histogram: mean > median > mode

Left-Skewed Distribution: Mean < Median < Mode (mean is pulled left by outliers)

Left-skewed histogram: mean < median < mode

3.2 Measures of Dispersion for Ungrouped Data

Definition and Importance

Measures of dispersion describe the spread or variability of a data set. Common measures include the range, variance, and standard deviation. These measures help assess the reliability and consistency of the data.

Section heading: Measures of Dispersion for Ungrouped Data

Range

The range is the difference between the largest and smallest values in a data set.

Formula:

Example: Table 3.4 shows the total area of four states. The range is calculated as the difference between the largest and smallest area values.

Table of total area of four states

Variance and Standard Deviation

The variance measures the average squared deviation from the mean. The standard deviation is the positive square root of the variance and is the most commonly used measure of dispersion.

Population Variance:
Sample Variance:
Population Standard Deviation:
Sample Standard Deviation:

Example: Table shows baggage fee revenues for six airlines. The variance and standard deviation are calculated using the sum of values and the sum of squared values.

Table of baggage fee revenues for six airlines Table of x and x squared for airlines

Example: Table shows earnings for six employees. The variance and standard deviation are calculated similarly.

Table of x and x squared for employee earnings

3.3 Mean, Variance, and Standard Deviation for Grouped Data

Mean for Grouped Data

For grouped data, the mean is calculated using the midpoints of the classes and their frequencies.

Population Mean:
Sample Mean:

Where is the class midpoint and is the class frequency.

Example: Table shows the frequency distribution of daily commuting times for 25 employees. The mean is calculated using the midpoints and frequencies.

Frequency distribution of daily commuting times Table with midpoints and products for mean calculation

Example: Table shows the frequency distribution of number of orders received each day. The mean is calculated similarly.

Frequency distribution of number of orders Table with midpoints and products for orders

Variance and Standard Deviation for Grouped Data

The formulas for variance and standard deviation for grouped data use the midpoints and frequencies:

Population Variance:
Sample Variance:
Population Standard Deviation:
Sample Standard Deviation:

Example: Table shows the frequency distribution of daily commuting times. The variance and standard deviation are calculated using the midpoints and frequencies.

Frequency distribution for variance and standard deviation calculation Table with squared midpoints for variance calculation

Example: Table shows the frequency distribution of number of orders. The variance and standard deviation are calculated similarly.

Frequency distribution for orders variance calculation Table with squared midpoints for orders variance calculation

Use of Standard Deviation

Chebyshev’s Theorem

Chebyshev’s theorem states that for any number , at least of the data values lie within standard deviations of the mean, regardless of the data distribution.

Chebyshev's theorem illustration Chebyshev's theorem for k=2 Chebyshev's theorem for k=3

Example: For a mean of 187 and standard deviation of 22, at least 75% of values lie between 143 and 231.

Calculation for Chebyshev's theorem example Chebyshev's theorem applied to blood pressure

Empirical Rule

For bell-shaped (normal) distributions:

About 68% of values lie within 1 standard deviation of the mean
About 95% within 2 standard deviations
About 99.7% within 3 standard deviations

Empirical rule illustration

Example: For a mean age of 40 and standard deviation of 12, about 95% of people are between 16 and 64 years old.

Empirical rule applied to age distribution

Measures of Position

Quartiles and Interquartile Range (IQR)

Quartiles divide a ranked data set into four equal parts. The first quartile (Q1) is the median of the lower half, the second quartile (Q2) is the median, and the third quartile (Q3) is the median of the upper half.

Quartiles illustration

The interquartile range (IQR) is and measures the spread of the middle 50% of the data.

Example: Table 3.3 (CEO compensation) is used to find quartiles and IQR.

Table of CEO compensations for quartile calculation Quartile calculation for CEO compensation

Example: Quartile calculation for ages of employees.

Quartile calculation for employee ages

Percentiles and Percentile Rank

Percentiles divide a ranked data set into 100 equal parts. The k-th percentile (Pk) is the value below which k% of the data falls.

Percentiles illustration

Box-and-Whisker Plot

Definition and Construction

A box-and-whisker plot visually displays the center, spread, and skewness of a data set using the median, quartiles, and extremes (excluding outliers). The box represents the interquartile range, and the whiskers extend to the smallest and largest values within 1.5 IQR of the quartiles.

Box-and-whisker plot construction Box-and-whisker plot with outlier