Skip to main content
Back

Statistical Analysis Chapter 3: Measures of Position, The Five Number Summary, and Boxplots

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Measures of Dispersion

The Empirical Rule

The Empirical Rule provides a way to describe the spread of data in a distribution that is approximately bell-shaped (normal distribution). It gives the approximate percentage of data values within one, two, and three standard deviations from the mean.

  • Within 1 standard deviation: Approximately 68% of the data lie between and .

  • Within 2 standard deviations: Approximately 95% of the data lie between and .

  • Within 3 standard deviations: Approximately 99.7% of the data lie between and .

Note: For sample data, use in place of and in place of .

Example: For IQ scores with mean $100:

  • Within 3 standard deviations: to (99.7% of scores)

  • Between 67.8 and 132.2 (2 standard deviations): , (95% of scores)

Measures of Position and Outliers

z-Scores

The z-score measures how many standard deviations a data value is from the mean. It is a standardized value that allows comparison across different distributions.

  • Population z-score:

  • Sample z-score:

  • The z-score is unitless, with mean 0 and standard deviation 1.

Example: Comparing baseball teams' run production:

  • Red Sox:

  • Dodgers:

  • The Red Sox had a relatively better year, as their z-score is higher.

Percentiles

Percentiles indicate the relative standing of a value within a data set. The p-th percentile is the value below which p% of the data fall.

  • The median is the 50th percentile.

  • If a number divides the lower 34% of the data from the upper 66%, it is the 34th percentile.

Quartiles

Quartiles divide the data set into four equal parts and are the most commonly used percentiles.

  • = 25th percentile

  • = 50th percentile = median

  • = 75th percentile

Quartiles help describe the spread and center of the data.

Interquartile Range (IQR)

The interquartile range (IQR) measures the spread of the middle 50% of the data and is resistant to outliers.

Outliers and Fences

Outliers are extreme observations in the data. They should be investigated as they may result from chance, errors, or other factors. Outliers are not necessarily invalid.

To check for outliers, use fences:

  • Lower Fence:

  • Upper Fence:

  • Values outside these fences are considered outliers.

Example: For the data set 1, 3, 4, 7, 8, 15, 16, 19, 23, 24, 27, 31, 33, 54:

  • , Median = 17.5,

  • Calculate IQR and fences to determine if 54 is an outlier.

Five Number Summary

The five-number summary provides a concise description of a data set:

  • Minimum (smallest value)

  • First quartile ( or )

  • Median (M or or )

  • Third quartile ( or )

  • Maximum (largest value)

These values summarize the center, spread, and tails of the distribution.

Boxplots

A boxplot is a graphical representation of the five-number summary. It displays the distribution's center, spread, and potential outliers.

  • The box shows , (median), and .

  • Whiskers extend to the minimum and maximum values.

Steps to Draw a Boxplot:

  1. Calculate the five-number summary.

  2. Draw a horizontal line covering all data values.

  3. Draw a box from to .

  4. Draw a line inside the box at the median.

  5. Draw whiskers from the box edges to the minimum and maximum values.

Boxplot Interpretation

Distribution

Boxplot

Symmetric: and are equally far from the median; min and max are equally far from the median.

Median line is in the center of the box; whiskers are equal length.

Skewed left: is further from the median than ; min is further from the median than max.

Median line is right of center; left whisker is longer.

Skewed right: is closer to the median than ; min is closer to the median than max.

Median line is left of center; right whisker is longer.

Summary

  • Percentiles and quartiles divide the data so that a certain percent is lower and a certain percent is higher.

  • Outliers are extreme values and can be identified using the upper and lower fences.

  • The five-number summary and boxplots provide concise and visual descriptions of data distributions.

Pearson Logo

Study Prep