Skip to main content
Back

(Lecture 4) Measures of Position: Percentiles, Quartiles, Box Plots, and Outlier Detection

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Measures of Position and Variability

Introduction

This section covers essential statistical tools for describing the position and spread of data, including percentiles, quartiles, interquartile range (IQR), box plots, and methods for detecting potential outliers in both discrete and continuous data.

Percentiles

Definition and Interpretation

  • Percentile: The pth percentile is a value such that p% of the observations fall below or at that value.

  • Percentiles are used to understand the relative standing of a value within a data set.

Example: The 90th percentile of test scores is the value below which 90% of the scores fall.

Procedure to Calculate the kth Percentile

  1. Arrange all data values in ascending order.

  2. Count the number of values in the data set, denoted as n.

  3. Compute p = k / 100, where k is the desired percentile (between 0 and 100).

  4. Multiply p by n to get the index: Index = p × n.

  5. If the index is not a whole number, round up to the nearest whole number and use that position in the ordered data.

  6. If the index is a whole number, take the average of the value at that position and the next value.

Example 1

  • Data: 85, 34, 42, 51, 84, 86, 78, 85, 87, 69, 74, 65

  • Ordered: 34, 42, 51, 65, 69, 74, 78, 84, 85, 85, 86, 87

  • Find the 80th percentile:

    • p = 0.80, n = 12

    • Index = 0.80 × 12 = 9.6 → round to 10

    • 10th value is 85

  • 80th percentile = 85

Example 2

  • Data: 25 test scores (e.g., 43, 54, 56, ..., 99, 99)

  • Find the 60th percentile:

    • p = 0.60, n = 25

    • Index = 0.60 × 25 = 15

    • Average the 15th and 16th values: (79 + 85) / 2 = 82

  • 60th percentile = 82

Quartiles

Definition and Calculation

  • Quartiles divide the data into four equal parts:

    • Q1 (First Quartile): 25% of data falls below this value.

    • Q2 (Second Quartile/Median): 50% of data falls below this value.

    • Q3 (Third Quartile): 75% of data falls below this value.

  • To find quartiles:

    1. Arrange data in order.

    2. Q2 is the median.

    3. Q1 is the median of the lower half (excluding Q2 if n is odd).

    4. Q3 is the median of the upper half (excluding Q2 if n is odd).

Interquartile Range (IQR)

Definition and Formula

  • The interquartile range (IQR) measures the spread of the middle 50% of the data.

  • Formula:

  • Example: If Q1 = and Q3 = , then IQR = .

Detecting Potential Outliers

1.5 × IQR Criterion (for Discrete Data)

  • An observation is a potential outlier if it falls more than 1.5 × IQR below Q1 or above Q3.

  • Lower limit:

  • Upper limit:

  • Values outside these limits are flagged as potential outliers.

z-Score Criterion (for Continuous Data)

  • The z-score of an observation measures how many standard deviations it is from the mean:

  • Observations with or are considered potential outliers (for bell-shaped distributions).

Box Plots

Five-Number Summary

  • The five-number summary consists of:

    • Minimum value

    • First quartile (Q1)

    • Median (Q2)

    • Third quartile (Q3)

    • Maximum value

Constructing a Box Plot

  • Draw a box from Q1 to Q3.

  • Draw a line inside the box at the median (Q2).

  • Draw lines ("whiskers") from the box to the smallest and largest values that are not potential outliers.

  • Plot potential outliers as individual points beyond the whiskers.

Example: Box Plot Construction

  • Data: 5, 7, 12, 14, 15, 22, 25, 30, 36, 42, 53

  • Arrange data, find Q1, Q2, Q3, and plot accordingly.

Interpreting Box Plots

  • Box plots provide a visual summary of the data's center, spread, and potential outliers.

  • Side-by-side box plots are useful for comparing distributions between groups.

  • Box plots do not show modality or gaps as clearly as histograms, but are effective for identifying outliers and comparing medians and spreads.

Comparing z-Scores

Definition and Application

  • z-scores allow comparison of values from different distributions (with different means and standard deviations).

  • Formula:

  • Example: Duane scores 84 on an exam with mean 80 and s = 4. Debbie scores 90 on an exam with mean 85 and s = 8.

  • Duane's z-score:

  • Debbie's z-score:

  • Although Debbie's raw score is higher, Duane's score is further above his class mean.

Summary Table: Outlier Detection Criteria

Criterion

Formula

Typical Use

1.5 × IQR Rule

Below or above

Discrete/ordinal data

z-score

or

Continuous, bell-shaped data

Key Takeaways

  • Percentiles and quartiles describe the position of data within a distribution.

  • The interquartile range (IQR) measures the spread of the central 50% of data.

  • Box plots visually summarize data using the five-number summary and help identify outliers.

  • Outliers can be detected using the 1.5 × IQR rule (for discrete data) or z-scores (for continuous data).

  • z-scores standardize values, allowing comparison across different distributions.

Pearson Logo

Study Prep