BackDescriptive Statistics: Measures of Relative Standing (z-scores, Quartiles, Percentiles, Box Plots)
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Descriptive Statistics
Measures of Relative Standing
Measures of relative standing are statistical tools used to describe the position of a data value within a dataset. They help compare values from different datasets and identify unusual or significant data points. The main measures include z-scores, quartiles, percentiles, and box plots.
Z-Scores
A z-score indicates how many standard deviations a data value (x) is above or below the mean of the dataset. Z-scores allow for comparison across different populations or datasets.
Definition: The z-score for a value x is calculated as:
For sample data:
For population data:
Properties:
Z-scores are unitless numbers.
If x is less than the mean, the z-score is negative.
If x is greater than the mean, the z-score is positive.
A data value is significantly low if its z-score is less than or equal to -2.
A data value is significantly high if its z-score is greater than or equal to +2.
Values with z-scores between -2 and +2 are considered not significant.
Application: Z-scores are used to compare scores from different tests or populations, and to identify outliers or unusual values.
Example: Given three test scores:
82 (mean = 75, SD = 4)
95 (mean = 85, SD = 8)
75 (mean = 70, SD = 2)
Calculate the z-score for each to determine which score is highest relative to its group.
Example: IQ scores are normally distributed with mean 100 and SD 15. For IQs of 72, 80, 101, 125, calculate z-scores and classify as significantly low or high.
Quartiles
Quartiles are measures of location that divide a sorted dataset into four equal parts, each containing approximately 25% of the data.
Q1 (First Quartile): Separates the lowest 25% from the highest 75%.
Q2 (Second Quartile): The median; separates the lowest 50% from the highest 50%.
Q3 (Third Quartile): Separates the lowest 75% from the highest 25%.
Quartile Structure:
Minimum | Q1 | Q2 (Median) | Q3 | Maximum |
|---|---|---|---|---|
0% | 25% | 50% | 75% | 100% |
Percentiles
Percentiles are measures of location that divide a dataset into 100 equal groups, each containing about 1% of the data. There are 99 percentiles, denoted P1, P2, ..., P99.
Finding the Percentile of a Data Value:
Round the result up to the nearest whole number.
Finding the Data Value for the k-th Percentile:
Order the data from least to greatest.
Calculate the index using , where is the number of data values and is the desired percentile.
If is a whole number, average the -th and next value; if not, round up and select that position.
Other Statistics
Interquartile Range (IQR): Measures the spread of the middle 50% of data.
Semi-interquartile Range: Half the IQR.
5-Number Summary: Consists of:
Minimum value
First quartile (Q1)
Median (Q2)
Third quartile (Q3)
Maximum value
Boxplots
A boxplot (box-and-whisker diagram) is a graphical representation of the 5-number summary. It displays the distribution, center, and spread of the data, and can help identify skewness and outliers.
Construction Steps:
Find the 5-number summary.
Draw a scale including the minimum and maximum values.
Draw a box from Q1 to Q3 with a line at the median (Q2).
Draw whiskers from the box to the minimum and maximum values.
Skewness:
If the boxplot is symmetric, the distribution is normal.
If the box extends further on one side, the distribution is skewed (left or right).
Outliers
Outliers are data values that are significantly distant from the majority of the data. They can affect the mean, standard deviation, and the appearance of histograms.
Outlier Criteria for Modified Boxplots:
A value is an outlier if it is above Q3 by more than 1.5 × IQR or below Q1 by more than 1.5 × IQR.
Modified Boxplot Construction:
Outliers are marked with a special symbol (e.g., asterisk).
Whiskers extend only to the minimum and maximum values that are not outliers.
Summary Table: Quartiles and Percentiles
Measure | Definition | Division |
|---|---|---|
Quartile (Q1, Q2, Q3) | Divides data into four equal parts | 25% each |
Percentile (P1 to P99) | Divides data into 100 equal parts | 1% each |
Example: Given a dataset of wait times, the 5-number summary can be calculated and used to construct a boxplot, identify skewness, and detect outliers.
Additional info: Academic context and formulas have been expanded for clarity and completeness.