Back(Lecture 4) Measures of Position: Percentiles, Quartiles, Box Plots, and Outlier Detection
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Measures of Position and Variability
Introduction
This section covers essential statistical tools for describing the position and spread of data, including percentiles, quartiles, interquartile range (IQR), box plots, and methods for detecting potential outliers in both discrete and continuous data.
Percentiles
Definition and Interpretation
Percentile: The pth percentile is a value such that p% of the observations fall below or at that value.
Percentiles are used to understand the relative standing of a value within a data set.
Example: The 90th percentile of test scores is the value below which 90% of the scores fall.
Procedure to Calculate the kth Percentile
Arrange all data values in ascending order.
Count the number of values in the data set, denoted as n.
Compute p = k / 100, where k is the desired percentile (between 0 and 100).
Multiply p by n to get the index: Index = p × n.
If the index is not a whole number, round up to the nearest whole number and use that position in the ordered data.
If the index is a whole number, take the average of the value at that position and the next value.
Example 1
Data: 85, 34, 42, 51, 84, 86, 78, 85, 87, 69, 74, 65
Ordered: 34, 42, 51, 65, 69, 74, 78, 84, 85, 85, 86, 87
Find the 80th percentile:
p = 0.80, n = 12
Index = 0.80 × 12 = 9.6 → round to 10
10th value is 85
80th percentile = 85
Example 2
Data: 25 test scores (e.g., 43, 54, 56, ..., 99, 99)
Find the 60th percentile:
p = 0.60, n = 25
Index = 0.60 × 25 = 15
Average the 15th and 16th values: (79 + 85) / 2 = 82
60th percentile = 82
Quartiles
Definition and Calculation
Quartiles divide the data into four equal parts:
Q1 (First Quartile): 25% of data falls below this value.
Q2 (Second Quartile/Median): 50% of data falls below this value.
Q3 (Third Quartile): 75% of data falls below this value.
To find quartiles:
Arrange data in order.
Q2 is the median.
Q1 is the median of the lower half (excluding Q2 if n is odd).
Q3 is the median of the upper half (excluding Q2 if n is odd).
Interquartile Range (IQR)
Definition and Formula
The interquartile range (IQR) measures the spread of the middle 50% of the data.
Formula:
Example: If Q1 = and Q3 = , then IQR = .
Detecting Potential Outliers
1.5 × IQR Criterion (for Discrete Data)
An observation is a potential outlier if it falls more than 1.5 × IQR below Q1 or above Q3.
Lower limit:
Upper limit:
Values outside these limits are flagged as potential outliers.
z-Score Criterion (for Continuous Data)
The z-score of an observation measures how many standard deviations it is from the mean:
Observations with or are considered potential outliers (for bell-shaped distributions).
Box Plots
Five-Number Summary
The five-number summary consists of:
Minimum value
First quartile (Q1)
Median (Q2)
Third quartile (Q3)
Maximum value
Constructing a Box Plot
Draw a box from Q1 to Q3.
Draw a line inside the box at the median (Q2).
Draw lines ("whiskers") from the box to the smallest and largest values that are not potential outliers.
Plot potential outliers as individual points beyond the whiskers.
Example: Box Plot Construction
Data: 5, 7, 12, 14, 15, 22, 25, 30, 36, 42, 53
Arrange data, find Q1, Q2, Q3, and plot accordingly.
Interpreting Box Plots
Box plots provide a visual summary of the data's center, spread, and potential outliers.
Side-by-side box plots are useful for comparing distributions between groups.
Box plots do not show modality or gaps as clearly as histograms, but are effective for identifying outliers and comparing medians and spreads.
Comparing z-Scores
Definition and Application
z-scores allow comparison of values from different distributions (with different means and standard deviations).
Formula:
Example: Duane scores 84 on an exam with mean 80 and s = 4. Debbie scores 90 on an exam with mean 85 and s = 8.
Duane's z-score:
Debbie's z-score:
Although Debbie's raw score is higher, Duane's score is further above his class mean.
Summary Table: Outlier Detection Criteria
Criterion | Formula | Typical Use |
|---|---|---|
1.5 × IQR Rule | Below or above | Discrete/ordinal data |
z-score | or | Continuous, bell-shaped data |
Key Takeaways
Percentiles and quartiles describe the position of data within a distribution.
The interquartile range (IQR) measures the spread of the central 50% of data.
Box plots visually summarize data using the five-number summary and help identify outliers.
Outliers can be detected using the 1.5 × IQR rule (for discrete data) or z-scores (for continuous data).
z-scores standardize values, allowing comparison across different distributions.