BackDescribing, Exploring, and Comparing Data: Measures of Relative Standing and Boxplots
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Measures of Relative Standing and Boxplots
Introduction
This chapter focuses on statistical methods for describing, exploring, and comparing data, with an emphasis on measures of relative standing such as z scores, percentiles, quartiles, and graphical summaries like boxplots. These tools help to understand the position of individual data values within a dataset and to identify significant or unusual observations.
Z Scores
Definition and Calculation
Z score (also called standard score or standardized value) measures how many standard deviations a data value is above or below the mean.
For a sample: where is the sample mean and is the sample standard deviation.
For a population: where is the population mean and is the population standard deviation.
Z scores are typically rounded to two decimal places (e.g., 2.31).
Identifying Significant Values
Values with or are considered significantly low or significantly high, respectively.
Values with are not considered significant.
Example: Comparing Relative Standing
To compare which of two data values is more extreme, calculate their z scores relative to their respective datasets.
Example: - Newborn baby weight: g, g, g - Adult body temperature: °F, °F, °F Calculate z scores for each to determine which is more extreme.
Quartiles
Definition and Calculation
Quartiles are measures of location that divide a dataset into four groups, each containing approximately 25% of the data.
Denoted as (first quartile), (second quartile, or median), and (third quartile).
Procedure:
Arrange data in increasing order.
Find the median ().
Divide the data into two halves. If the number of observations is odd, include the median in both halves.
is the median of the lower half; is the median of the upper half.
Quartiles in Different Distributions
Quartiles divide data into four equal parts regardless of the distribution's shape (uniform, bell-shaped, right-skewed, left-skewed).
Statistics Defined Using Quartiles and Percentiles
Key Measures
Interquartile Range (IQR): Measures the spread of the middle 50% of data.
Semi-interquartile Range: Half the IQR.
Midquartile Range: Average of and .
10–90 Quartile Range: Difference between the 90th and 10th percentiles.
5-Number Summary
Definition
The five-number summary of a dataset consists of: Minimum, , (Median), , Maximum.
Boxplots (Box-and-Whisker Diagrams)
Definition and Construction
A boxplot is a graphical representation of the five-number summary.
It consists of a box from to , with a line at the median (), and "whiskers" extending to the minimum and maximum values.
Procedure:
Find the five-number summary.
Draw a box from to .
Draw a line at the median inside the box.
Extend lines (whiskers) from the box to the minimum and maximum values.
Determining Outliers
Outlier Detection Using Quartiles
Lower limit:
Upper limit:
Any data value less than the lower limit or greater than the upper limit is considered an outlier.
Skewness
Types of Distribution Shapes
Right-skewed: Longer tail on the right; most data are concentrated on the left.
Symmetric: Data are evenly distributed around the center.
Left-skewed: Longer tail on the left; most data are concentrated on the right.
Percentiles
Definition and Calculation
Percentiles divide a dataset into 100 groups, each containing about 1% of the data.
There is no universal method for calculating percentiles, but a common approach is:
Percentiles are denoted as .
Finding the Percentile of a Data Value
Arrange the data in increasing order.
Count the number of values less than the target value .
Apply the formula above and round to the nearest whole number.
Example: If 27 out of 50 data speeds are less than 14.7 Mbps, then So, 14.7 Mbps is at the 54th percentile.
Converting Percentile to a Data Value
Given a percentile , the position in the ordered data is: where is the total number of values.
The value at this position is the th percentile.
Example: Comparing Percentile Standing
If Jack is ranked 70th in a class of 200, then 130 students are ranked below him: percentile.
If Jill is at the 70th percentile, then 70% of the class is ranked below her.
Therefore, Jill has a higher standing than Jack.
Summary Table: Key Measures of Relative Standing
Measure | Definition | Formula |
|---|---|---|
Z Score | Standardized value indicating number of standard deviations from mean | (sample) (population) |
Quartile | Divides data into four equal parts | , (median), |
Interquartile Range (IQR) | Spread of middle 50% of data | |
Percentile | Divides data into 100 equal parts | |
Five-number summary | Describes data spread and center | Min, , , , Max |
Boxplot | Graphical summary of five-number summary | Box from to , line at , whiskers to Min/Max |
Outlier | Extreme value outside expected range | Below or above |
Additional info: Academic context and examples have been expanded for clarity and completeness.