Skip to main content
Back

Describing, Exploring, and Comparing Data: Measures of Center, Variation, and Relative Standing

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter Overview

Describing, Exploring, and Comparing Data

This chapter introduces fundamental concepts in statistics for describing, exploring, and comparing data sets. It covers measures of center, measures of variation, and measures of relative standing, providing definitions, formulas, and examples for each.

Measures of Center

Introduction to Measures of Center

Measures of center identify the "middle" or typical value in a data set. They help summarize and compare data distributions.

  • Notation:

    • : Sample mean

    • μ: Population mean

    • n: Number of data values in a sample

    • N: Number of data values in a population

Mean (Arithmetic Average)

The mean is the sum of all data values divided by the number of values.

  • Formula:

  • Example: For data 4, 5, 5, 7, 9, 8:

  • Pros:

    • Uses every data value

  • Cons:

    • Affected by extreme values (outliers)

Median

The median is the middle value when data are ordered. If the number of values is even, it is the average of the two middle values.

  • Procedure:

    • Order the data from smallest to largest

    • If n is odd, median is the middle value

    • If n is even, median is the average of the two middle values

  • Example: For data 4, 5, 5, 7, 9, 8 (ordered: 4, 5, 5, 7, 8, 9): Median = (5 + 7)/2 = 6

  • Pros:

    • Resistant to extreme values

  • Cons:

    • Does not use all data values

    • May not represent data with gaps

Mode

The mode is the most frequent value in a data set. Data may have no mode, one mode (unimodal), or multiple modes (bimodal, multimodal).

  • Example: For data 4, 5, 5, 7, 9, 8: Mode = 5

  • Pros:

    • Can be used with nominal data

  • Cons:

    • May not represent the center

Midrange

The midrange is the value halfway between the minimum and maximum data values.

  • Formula:

  • Example: For data 4, 5, 5, 7, 9, 8: Midrange = (4 + 9)/2 = 6.5

  • Pros:

    • Quick estimate of center

  • Cons:

    • Very sensitive to extreme values

Rounding Measures of Center

  • Round only the final answer, not intermediate steps

  • Round to one more decimal place than the original data

Measures of Variation

Introduction to Measures of Variation

Measures of variation describe the amount of spread or dispersion in a data set. They help assess consistency and variability.

Range

  • Formula:

  • Example: For data 4, 5, 5, 7, 9, 8: Range = 9 - 4 = 5

  • Pros:

    • Easy to compute

  • Cons:

    • Very sensitive to extreme values

Variance

Variance measures the average squared deviation from the mean.

  • Sample Variance Formula:

  • Population Variance Formula:

  • Pros:

    • Uses all data values

  • Cons:

    • Units are squared

Standard Deviation

Standard deviation is the square root of the variance and represents the average distance of data values from the mean.

  • Sample Standard Deviation Formula:

  • Pros:

    • Units are the same as the original data

  • Cons:

    • Increases with extreme values

Quarter Range

The quarter range is a quick estimate of spread, calculated as one-fourth of the range.

  • Formula:

  • Pros:

    • Quick estimate

  • Cons:

    • Affected by extreme values

Why Do We Care About Standard Deviation?

  • Helps determine if a sample value is "significantly" high or low

  • Used in the Range Rule of Thumb: Significantly high: Significantly low:

Empirical Rule

The Empirical Rule applies to data sets that are approximately normally distributed.

  • About 68% of data values fall within one standard deviation of the mean

  • About 95% fall within two standard deviations

  • About 99.7% fall within three standard deviations

Empirical Rule Visual Aid

See the following diagram for a normal distribution:

Empirical Rule Example

  • Suppose IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. What percent of scores are between 85 and 115? Answer: 68% (within one standard deviation)

Frequency Distributions

Introduction to Frequency Distributions

Frequency distributions organize data into classes or intervals and show the frequency of values in each class.

Example Frequency Distribution Table

Weight (lbs)

Frequency

1.2-1.4

2

1.5-1.7

4

1.8-2.0

6

2.1-2.3

3

2.4-2.6

1

Calculating Mean from Frequency Distribution

  • Find midpoint for each class

  • Multiply midpoint by frequency

  • Add all products and divide by total frequency

  • Formula:

Calculating Standard Deviation from Frequency Distribution

  • Create and columns

  • Multiply by frequency

  • Add all products and use the variance formula

  • Formula:

Measures of Relative Standing

Introduction to Measures of Relative Standing

Measures of relative standing identify the position of a data value relative to other values in the data set. They help determine outliers and compare across data sets.

z-scores

A z-score indicates how many standard deviations a value is from the mean.

  • Formula:

  • Interpretation:

    • z > 2 or z < -2: Significant data values

  • Example: If IQ = 130, mean = 100, s = 15:

Percentiles

Percentiles divide a data set into 100 equal parts. The pth percentile is the value below which p% of the data fall.

  • Finding the Percentile of a Data Value:

  • Converting a Percentile to a Data Value:

    • Calculate locator:

    • If L is not a whole number, round up and find the value at that position

    • If L is a whole number, average the values at L and L+1

Quartiles

Quartiles divide a data set into four equal parts. The first quartile (Q1) is the 25th percentile, the second quartile (Q2) is the median, and the third quartile (Q3) is the 75th percentile.

  • Example: Find Q1, Q2, Q3 for a given data set

5-Number Summary

The 5-number summary consists of the minimum, Q1, median (Q2), Q3, and maximum.

  • Example: For Super Bowl data, the 5-number summary is: min, Q1, Q2, Q3, max

Boxplot (Box-and-Whisker Plot)

A boxplot visually displays the 5-number summary and helps compare data sets.

  • Draw a scale with minimum and maximum values

  • Draw a box from Q1 to Q3, with a divider at the median

  • Extend "whiskers" from the box to the minimum and maximum values

Summary Table: Measures of Center and Variation

Measure

Definition

Formula

Pros

Cons

Mean

Arithmetic average

Uses all data

Affected by outliers

Median

Middle value

--

Resistant to outliers

Does not use all data

Mode

Most frequent value

--

Nominal data

May not represent center

Midrange

Average of min and max

Quick estimate

Sensitive to outliers

Range

Difference between max and min

Easy to compute

Sensitive to outliers

Variance

Average squared deviation

Uses all data

Units squared

Standard Deviation

Average distance from mean

Same units as data

Sensitive to outliers

Additional info:

  • Some examples and tables were inferred and expanded for clarity.

  • Visual aids and diagrams referenced in the slides are described textually.

Pearson Logo

Study Prep