Numerically Summarizing Data: Measures of Central Tendency, Dispersion, Position, and Visualization

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 3: Numerically Summarizing Data

3.1 Measures of Central Tendency

Measures of central tendency are statistical values that represent the center or typical value of a dataset. The three main measures are the mean, median, and mode.

Arithmetic Mean: The mean is the sum of all data values divided by the number of values. For a population, the mean is denoted by and for a sample by . Population mean: Sample mean:
Median: The median is the middle value when data are arranged in order. Steps to find the median:
1. Arrange data in ascending order.
2. If the number of observations is odd, the median is the middle value.
3. If even, the median is the average of the two middle values.
Mode: The mode is the value that occurs most frequently in the data set. If no value repeats, the data set has no mode; if multiple values repeat, it is multimodal.

Table: Measures of Central Tendency

Measure	Computation	Interpretation	When to Use
Mean	Sum / Count	Center of gravity	Quantitative data, symmetric distributions
Median	Middle value	Positional center	Skewed distributions, ordinal data
Mode	Most frequent value	Most common value	Nominal data, categorical data

Example:

Given data: 23, 24, 21, 18, 15, 24, 43 Compute the mean, median, and mode.

3.2 Measures of Dispersion

Measures of dispersion describe the spread or variability of data. Common measures include range, variance, and standard deviation.

Range: The difference between the largest and smallest data values.
Variance: The average squared deviation from the mean. Population variance: Sample variance:
Standard Deviation: The square root of the variance. Population standard deviation: Sample standard deviation:

Example:

Given data: 23, 24, 21, 18, 15, 24, 43 Compute the range, variance, and standard deviation.

Empirical Rule:

For bell-shaped distributions:
- ~68% of data within 1 standard deviation of the mean
- ~95% within 2 standard deviations
- ~99.7% within 3 standard deviations

3.4 Measures of Position and Outliers

Measures of position describe the relative standing of a value within a data set. Common measures include z-scores, percentiles, and quartiles.

Z-Score: Indicates how many standard deviations a value is from the mean. Population z-score: Sample z-score:
Percentile: The value below which a given percentage of observations fall.
Quartiles: Divide data into four equal parts:
- Q1: 25th percentile
- Q2: 50th percentile (median)
- Q3: 75th percentile
Interquartile Range (IQR): Measures the spread of the middle 50% of data.

Checking for Outliers:

Lower fence:
Upper fence:
Values outside these fences are considered outliers.

3.5 The Five-Number Summary and Boxplots

The five-number summary provides a concise description of a data set's distribution.

Five-number summary: Minimum, Q1, Median (Q2), Q3, Maximum
Boxplot: A graphical representation of the five-number summary, showing the spread and potential outliers.

Steps to Draw a Boxplot:

Determine the lower and upper fences using IQR.
Draw a box from Q1 to Q3, with a line at the median.
Extend whiskers to the smallest and largest values within the fences.
Plot outliers as individual points.

Example:

Given data: 23, 24, 21, 18, 15, 24, 43 Compute the five-number summary and construct a boxplot.

Summary Table: Which Measures to Report

Shape of Distribution	Measure of Central Tendency	Measure of Dispersion
Skewed left or right	Median	Interquartile range
Symmetric	Mean	Standard deviation

Additional info:

Resistant statistics (e.g., median, IQR) are not affected by extreme values or outliers.
Non-resistant statistics (e.g., mean, standard deviation) can be influenced by outliers.