BackMeasures of Position and Outliers: Z-scores, Percentiles, Quartiles, IQR, Outliers, Five-Number Summary, and Boxplots
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Measures of Position and Outliers
Overview
This section covers essential statistical tools for describing the position of data values within a dataset and identifying unusual observations. Topics include z-scores, percentiles, quartiles, interquartile range (IQR), outliers, the five-number summary, and boxplots.
Z-scores
Definition and Calculation
The z-score represents the distance that a data value is from the mean, measured in standard deviations. It is a standardized value that allows comparison across different datasets.
Population z-score:
Sample z-score:
Interpretation
A z-score is a new variable with mean 0 and standard deviation 1.
The value of the z-score reflects the relative standing of the measurement:
If , then (the mean).
If , then (below the mean).
If , then (above the mean).
Example: Z-score Comparison
Imene scored 88 on an exam (, ), Akito scored 91 (, ).
Imene:
Akito:
Imene performed relatively better, being further above the mean in standard deviation units.
Empirical Rule for Z-scores
If the frequency distribution is bell-shaped (normal):
Approximately 68% of observations have z-scores within (-1, 1).
Approximately 95% within (-2, 2).
Approximately 99.7% within (-3, 3).
Percentiles
Definition
The kth percentile of a data set, arranged in ascending order, is the value such that of the observations fall below and fall above .
Example: Interpreting Percentiles
If a score of 600 is in the 74th percentile on the SAT Mathematics exam, it means 74% of scores are less than or equal to 600, and 26% are greater.
Quartiles
Definition
Quartiles divide data into four equal parts:
Q1: 25th percentile
Q2: 50th percentile (median)
Q3: 75th percentile
Example: Quartiles Calculation
Given vehicle speeds: 20, 24, 27, 28, 29, 30, 32, 33, 34, 36, 38, 39, 40, 40
Median (): Mean of 7th and 8th values:
First quartile (): Median of first 7 values: 28
Third quartile (): Median of last 7 values: 38
Interpretation
25% of speeds ≤ 28 mph; 75% > 28 mph
50% ≤ 32.5 mph; 50% > 32.5 mph
75% ≤ 38 mph; 25% > 38 mph
Interquartile Range (IQR)
Definition
The interquartile range (IQR) is the range of the middle 50% of the observations in a data set:
Example
For vehicle speeds, , :
mph
The middle 50% of car speeds range over 10 mph.
Effect of Outliers on Summary Statistics
Example Table
Suppose a 15th car travels at 100 mph. How does this affect summary statistics?
Without 15th car | With 15th car | |
|---|---|---|
Mean | 32.1 mph | 36.7 mph |
Median | 32.5 mph | 33 mph |
Standard deviation | 6.2 mph | 18.5 mph |
IQR | 10 mph | 11 mph |
Additional info: Outliers have a large effect on the mean and standard deviation, but less effect on the median and IQR.
Outliers
Definition
An outlier is an observation that is unusually large or small relative to the other values in a data set.
Outliers may occur by chance, measurement error, data entry error, or sampling error.
Detecting Outliers: Quartiles Method
Step 1: Determine and .
Step 2: Compute IQR:
Step 3: Calculate fences:
Lower Fence (LF):
Upper Fence (UF):
Any value less than LF or greater than UF is an outlier.
Example 1: No Outliers
, ,
LF: mph
UF: mph
No values below 13 or above 53 mph; no outliers.
Example 2: Outlier Detected
Data: 5, 15, 16, 20, 21, 25, 26, 27, 30, 30, 31, 32, 32, 34, 35, 38, 38, 41, 43, 77
, ,
LF:
UF:
77 is above UF; it is an outlier.
Outliers | Usual | Outliers |
|---|---|---|
LF -4.5 | UF 63.5 | 77 |
Five-Number Summary
Definition
The five-number summary consists of:
Minimum ()
First quartile ()
Median ( or )
Third quartile ()
Maximum ()
Comments
The median is a resistant measure of central tendency.
The IQR is a resistant measure of variation.
Minimum and maximum describe the tails of the distribution.
Example: Credit Card Interest Rates
Institution | Rate |
|---|---|
Pulaski Bank and Trust Company | 6.5% |
Rainier Pacific Savings Bank | 12.0% |
Wells Fargo Bank NA | 14.4% |
Firstbank of Colorado | 14.4% |
Lafayette Ambassador Bank | 14.3% |
Infibank | 13.0% |
United Bank, Inc. | 13.3% |
First National Bank Of The Mid-Cities | 13.9% |
Bank of Louisiana | 9.9% |
Bar Harbor Bank and Trust Company | 14.5% |
Ordered rates: 6.5%, 9.9%, 12.0%, 13.0%, 13.3%, 13.9%, 14.3%, 14.4%, 14.4%, 14.5%
Five-number summary: 6.5%, 12.0%, 13.6%, 14.4%, 14.5%
Boxplots
Definition and Construction
A boxplot is a graphical representation of the five-number summary, useful for visualizing the distribution and identifying outliers.
Step 1: Determine lower and upper inner fences:
Lower Fence:
Upper Fence:
Step 2: Draw a number line including min and max values. Insert vertical lines at , , and ; enclose in a box.
Step 3: Label the fences.
Step 4: Draw whiskers from to the smallest value above LF, and from to the largest value below UF.
Step 5: Mark outliers (values outside fences) with an asterisk (*).
Example: TV-viewing Data
Five-number summary: , , , ,
Outlier: 77 (above UF)
Adjacent values: (LF), (UF)
Simple Boxplot
If no outliers are present, the boxplot consists only of the box and whiskers between the minimum and maximum values.
Describing Distribution Shape with Boxplots and Quartiles
Boxplots and quartiles can be used to describe the shape of a distribution:
Skewed right: Median closer to , longer whisker to the right.
Symmetric: Median centered, whiskers of similar length.
Skewed left: Median closer to , longer whisker to the left.
For example, the interest rate boxplot indicates a distribution skewed left.