BackDescribing, Exploring, and Comparing Data: Measures of Center, Variation, and Relative Standing
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter Overview
Describing, Exploring, and Comparing Data
This chapter introduces fundamental concepts in statistics for describing, exploring, and comparing data sets. It covers measures of center, measures of variation, and measures of relative standing, providing definitions, formulas, and examples for each.
Measures of Center
Introduction to Measures of Center
Measures of center identify the "middle" or typical value in a data set. They help summarize and compare data distributions.
Notation:
x̄: Sample mean
μ: Population mean
n: Number of data values in a sample
N: Number of data values in a population
Mean (Arithmetic Average)
The mean is the sum of all data values divided by the number of values.
Formula:
Example: For data 4, 5, 5, 7, 9, 8:
Pros:
Uses every data value
Cons:
Affected by extreme values (outliers)
Median
The median is the middle value when data are ordered. If the number of values is even, it is the average of the two middle values.
Procedure:
Order the data from smallest to largest
If n is odd, median is the middle value
If n is even, median is the average of the two middle values
Example: For data 4, 5, 5, 7, 9, 8 (ordered: 4, 5, 5, 7, 8, 9): Median = (5 + 7)/2 = 6
Pros:
Resistant to extreme values
Cons:
Does not use all data values
May not represent data with gaps
Mode
The mode is the most frequent value in a data set. Data may have no mode, one mode (unimodal), or multiple modes (bimodal, multimodal).
Example: For data 4, 5, 5, 7, 9, 8: Mode = 5
Pros:
Can be used with nominal data
Cons:
May not represent the center
Midrange
The midrange is the value halfway between the minimum and maximum data values.
Formula:
Example: For data 4, 5, 5, 7, 9, 8: Midrange = (4 + 9)/2 = 6.5
Pros:
Quick estimate of center
Cons:
Very sensitive to extreme values
Rounding Measures of Center
Round only the final answer, not intermediate steps
Round to one more decimal place than the original data
Measures of Variation
Introduction to Measures of Variation
Measures of variation describe the amount of spread or dispersion in a data set. They help assess consistency and variability.
Range
Formula:
Example: For data 4, 5, 5, 7, 9, 8: Range = 9 - 4 = 5
Pros:
Easy to compute
Cons:
Very sensitive to extreme values
Variance
Variance measures the average squared deviation from the mean.
Sample Variance Formula:
Population Variance Formula:
Pros:
Uses all data values
Cons:
Units are squared
Standard Deviation
Standard deviation is the square root of the variance and represents the average distance of data values from the mean.
Sample Standard Deviation Formula:
Pros:
Units are the same as the original data
Cons:
Increases with extreme values
Quarter Range
The quarter range is a quick estimate of spread, calculated as one-fourth of the range.
Formula:
Pros:
Quick estimate
Cons:
Affected by extreme values
Why Do We Care About Standard Deviation?
Helps determine if a sample value is "significantly" high or low
Used in the Range Rule of Thumb: Significantly high: Significantly low:
Empirical Rule
The Empirical Rule applies to data sets that are approximately normally distributed.
About 68% of data values fall within one standard deviation of the mean
About 95% fall within two standard deviations
About 99.7% fall within three standard deviations
Empirical Rule Visual Aid
See the following diagram for a normal distribution:
Empirical Rule Example
Suppose IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. What percent of scores are between 85 and 115? Answer: 68% (within one standard deviation)
Frequency Distributions
Introduction to Frequency Distributions
Frequency distributions organize data into classes or intervals and show the frequency of values in each class.
Example Frequency Distribution Table
Weight (lbs) | Frequency |
|---|---|
1.2-1.4 | 2 |
1.5-1.7 | 4 |
1.8-2.0 | 6 |
2.1-2.3 | 3 |
2.4-2.6 | 1 |
Calculating Mean from Frequency Distribution
Find midpoint for each class
Multiply midpoint by frequency
Add all products and divide by total frequency
Formula:
Calculating Standard Deviation from Frequency Distribution
Create and columns
Multiply by frequency
Add all products and use the variance formula
Formula:
Measures of Relative Standing
Introduction to Measures of Relative Standing
Measures of relative standing identify the position of a data value relative to other values in the data set. They help determine outliers and compare across data sets.
z-scores
A z-score indicates how many standard deviations a value is from the mean.
Formula:
Interpretation:
z > 2 or z < -2: Significant data values
Example: If IQ = 130, mean = 100, s = 15:
Percentiles
Percentiles divide a data set into 100 equal parts. The pth percentile is the value below which p% of the data fall.
Finding the Percentile of a Data Value:
Converting a Percentile to a Data Value:
Calculate locator:
If L is not a whole number, round up and find the value at that position
If L is a whole number, average the values at L and L+1
Quartiles
Quartiles divide a data set into four equal parts. The first quartile (Q1) is the 25th percentile, the second quartile (Q2) is the median, and the third quartile (Q3) is the 75th percentile.
Example: Find Q1, Q2, Q3 for a given data set
5-Number Summary
The 5-number summary consists of the minimum, Q1, median (Q2), Q3, and maximum.
Example: For Super Bowl data, the 5-number summary is: min, Q1, Q2, Q3, max
Boxplot (Box-and-Whisker Plot)
A boxplot visually displays the 5-number summary and helps compare data sets.
Draw a scale with minimum and maximum values
Draw a box from Q1 to Q3, with a divider at the median
Extend "whiskers" from the box to the minimum and maximum values
Summary Table: Measures of Center and Variation
Measure | Definition | Formula | Pros | Cons |
|---|---|---|---|---|
Mean | Arithmetic average | Uses all data | Affected by outliers | |
Median | Middle value | -- | Resistant to outliers | Does not use all data |
Mode | Most frequent value | -- | Nominal data | May not represent center |
Midrange | Average of min and max | Quick estimate | Sensitive to outliers | |
Range | Difference between max and min | Easy to compute | Sensitive to outliers | |
Variance | Average squared deviation | Uses all data | Units squared | |
Standard Deviation | Average distance from mean | Same units as data | Sensitive to outliers |
Additional info:
Some examples and tables were inferred and expanded for clarity.
Visual aids and diagrams referenced in the slides are described textually.