BackNumerically Summarizing Data: Measures of Central Tendency, Dispersion, Position, and Visualization
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 3: Numerically Summarizing Data
3.1 Measures of Central Tendency
Measures of central tendency are statistical values that represent the center or typical value of a dataset. The three main measures are the mean, median, and mode.
Arithmetic Mean: The mean is the sum of all data values divided by the number of values. For a population, the mean is denoted by and for a sample by . Population mean: Sample mean:
Median: The median is the middle value when data are arranged in order. Steps to find the median:
Arrange data in ascending order.
If the number of observations is odd, the median is the middle value.
If even, the median is the average of the two middle values.
Mode: The mode is the value that occurs most frequently in the data set. If no value repeats, the data set has no mode; if multiple values repeat, it is multimodal.
Table: Measures of Central Tendency
Measure | Computation | Interpretation | When to Use |
|---|---|---|---|
Mean | Sum / Count | Center of gravity | Quantitative data, symmetric distributions |
Median | Middle value | Positional center | Skewed distributions, ordinal data |
Mode | Most frequent value | Most common value | Nominal data, categorical data |
Example:
Given data: 23, 24, 21, 18, 15, 24, 43 Compute the mean, median, and mode.
3.2 Measures of Dispersion
Measures of dispersion describe the spread or variability of data. Common measures include range, variance, and standard deviation.
Range: The difference between the largest and smallest data values.
Variance: The average squared deviation from the mean. Population variance: Sample variance:
Standard Deviation: The square root of the variance. Population standard deviation: Sample standard deviation:
Example:
Given data: 23, 24, 21, 18, 15, 24, 43 Compute the range, variance, and standard deviation.
Empirical Rule:
For bell-shaped distributions:
~68% of data within 1 standard deviation of the mean
~95% within 2 standard deviations
~99.7% within 3 standard deviations
3.4 Measures of Position and Outliers
Measures of position describe the relative standing of a value within a data set. Common measures include z-scores, percentiles, and quartiles.
Z-Score: Indicates how many standard deviations a value is from the mean. Population z-score: Sample z-score:
Percentile: The value below which a given percentage of observations fall.
Quartiles: Divide data into four equal parts:
Q1: 25th percentile
Q2: 50th percentile (median)
Q3: 75th percentile
Interquartile Range (IQR): Measures the spread of the middle 50% of data.
Checking for Outliers:
Lower fence:
Upper fence:
Values outside these fences are considered outliers.
3.5 The Five-Number Summary and Boxplots
The five-number summary provides a concise description of a data set's distribution.
Five-number summary: Minimum, Q1, Median (Q2), Q3, Maximum
Boxplot: A graphical representation of the five-number summary, showing the spread and potential outliers.
Steps to Draw a Boxplot:
Determine the lower and upper fences using IQR.
Draw a box from Q1 to Q3, with a line at the median.
Extend whiskers to the smallest and largest values within the fences.
Plot outliers as individual points.
Example:
Given data: 23, 24, 21, 18, 15, 24, 43 Compute the five-number summary and construct a boxplot.
Summary Table: Which Measures to Report
Shape of Distribution | Measure of Central Tendency | Measure of Dispersion |
|---|---|---|
Skewed left or right | Median | Interquartile range |
Symmetric | Mean | Standard deviation |
Additional info:
Resistant statistics (e.g., median, IQR) are not affected by extreme values or outliers.
Non-resistant statistics (e.g., mean, standard deviation) can be influenced by outliers.