Measures of Position: z-Scores, Percentiles, Quartiles, and Boxplots

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Measures of Position (Relative Standing)

Standard Score (z-score)

The z-score (also known as standard score or standardized value) is a statistical measure that describes the position of a data value relative to the mean of the dataset, expressed in terms of standard deviations. It is a fundamental concept for comparing values across different distributions and identifying unusual values.

Definition: The z-score is the number of standard deviations a value x is above or below the mean.
Formula: For a sample: ; for a population:
Unitless: z-scores have no units, making them useful for comparison.
Interpretation: Positive z-scores indicate values above the mean; negative z-scores indicate values below the mean.

Definition of z score, with formulas for both sample and population: z = (x - x̄) / s for a sample, and z = (x - μ) / σ for a population.

Important Properties of z-Scores

A z-score quantifies how far a value is from the mean in standard deviation units.
z-scores are unitless.
Values with z-scores ≤ -2 or ≥ 2 are considered significantly low or significantly high, respectively.
If a data value is less than the mean, its z-score is negative.

Text describing the important properties of z scores, including their definition, unitless nature, significance thresholds, and the relationship between z score sign and data value relative to the mean.

Identifying Unusual (Significant) Values

Statisticians often consider values with a probability of 5% or less as unusual. According to the Empirical Rule, values more than two standard deviations from the mean are unusual.

Unusual values: z-score > 2.00 or z-score < -2.00
Empirical Rule: 68.2% of values within ±1σ, 95.4% within ±2σ, 99.7% within ±3σ

A bell curve (normal distribution) with percentages showing areas within one, two, and three standard deviations from the mean (μ): 68.2% within ±1σ, 95.4% within ±2σ, and 99.7% within ±3σ. Number line showing z-scores, with values outside -2 and 2 labeled as 'UNUSUAL' in red, and values between -2 and 2 labeled as 'Not Unusual.'

Percentiles

Percentiles are a popular method for measuring the position of a data value within a dataset. The percentile of a value indicates the percentage of data values below it.

Definition: The percentile of value x is the percentage of values less than x.
Formula:
Round the result to the nearest whole number.

Formula for finding the percentile of a data value: Percentile of value x equals the number of values less than x divided by the total number of values, multiplied by 100. Round the result to the nearest whole number.

Caution on Percentiles and Quartiles

Note: There is no universal agreement on procedures for calculating percentiles and quartiles. Different calculators or software may yield different results.

Caution note explaining that there is no universal agreement on procedures for calculating percentiles and quartiles, and using different calculators or software may yield different results.

Converting a Percentile to a Data Value

To find the data value corresponding to a given percentile, use the locator formula:

Locator formula:
k = kth percentile, n = total number of data values
If the locator is a decimal, round up to the next integer.
If the locator is a whole number, take the average of that value and the next data value.

Example: Walking Data

Given a list of walking times for 45 students:

Rotated numbers in a vertical line: 12.0, 12.5677, 13.00003, 13.888, 14.022, 14.56778889, 15.000013333344, 15.8, 16.3, 16.889, 17.0, 17.8.

To find the 90th percentile (P90):

Locator:
The 41st value is 16.8 min.

A red arrow points to the number 16.889, which is circled among a vertical list of numbers.

Example: Meal Data

For a small dataset of meal prices:

A horizontal number line from 25 to 35 with colored circles above the numbers: a purple circle at 26, a green circle at 30, and two stacked red circles at 32.

P50 (Median) = $31, but locator formula gives the 2nd value ($30).
Locator:
For whole-number locators, average the value and the next data value.

Quartiles and the Five-Number Summary

Quartiles divide the data into four equal parts. The Five-Number Summary consists of the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum.

Q1 = P25 (First Quartile)
Q2 = P50 (Median)
Q3 = P75 (Third Quartile)

Minimum: 9.7, 1st Quartile: 11.2, 2nd Quartile: 12.35, 3rd Quartile: 14.3, Maximum: 30.3.

Statistic	Value
Minimum	9.7
Q1 (25th percentile)	11.2
Median (Q2, 50th percentile)	12.35
Q3 (75th percentile)	14.3
Maximum	30.3

Percentile Table Example

Percentile	Value
25th	11.2
50th	12.35
75th	14.725

Table showing percentiles: 25th percentile is 11.2, 50th percentile is 12.35, 75th percentile is 14.725.

Boxplots and Interquartile Range (IQR)

A boxplot (or box-and-whisker plot) visually represents the Five-Number Summary. The Interquartile Range (IQR) is a stable measure of spread, calculated as:

Formula:
IQR is less affected by extreme values than the range or standard deviation.

Horizontal box plot showing data distribution from 9.7 to 30.3, with a median of 12.32, first quartile at 11.2, and third quartile at 14.3.

Modified Boxplots and Outliers

Outliers are data values that are far from the rest of the group. Modified boxplots use special symbols to identify outliers and adjust the whiskers to exclude them. The procedure for identifying outliers is:

Find quartiles Q1, Q2, Q3.
Calculate IQR:
Evaluate
A value is an outlier if it is above or below

Step-by-step instructions for identifying outliers in modified boxplots, including finding quartiles, calculating the interquartile range, multiplying by 1.5, and defining outliers as values beyond 1.5 times the IQR above Q3 or below Q1.

Some statistical programs distinguish between mild outliers (outside 1.5 IQRs) and extreme outliers (outside 3 IQRs).

Text excerpt explaining that modified boxplots use special symbols to identify outliers and adjust the whiskers to exclude outliers, unlike regular (skeletal) boxplots.

Example: Outlier Identification

For the electricity data, 30.3 is an extreme outlier since it lies outside the upper fence ().

Dot plot showing frequency of values from 9 to 30, with numbers 7, 2, 4, and 3 circled in red, accompanied by two horizontal box plots indicating distribution and outliers. Summary statistics are listed on the right.

Summary Table: Stability of Common Statistics

Stability	Measure of Center	Measure of Variation
Very Unstable	Midrange	Range
Kinda Unstable	Mean	Standard Deviation
Very Stable	Median	IQR

Additional info: These notes expand on the original content by providing definitions, formulas, examples, and tables for clarity and completeness. All images included are directly relevant to the adjacent explanations, reinforcing key concepts in measures of position.