Describing Data: Numerical Measures – Study Notes for Statistics for Business

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Describing Data: Numerical Measures

Chapter Overview

This chapter introduces essential numerical measures used to summarize and describe data in business statistics. Understanding these measures allows for effective data analysis, comparison, and interpretation, which are crucial for informed business decision-making.

Central Tendency: Mean, median, and mode
Variation: Range, variance, standard deviation, coefficient of variation
Relative Location: Percentiles, quartiles
Distribution Shape: Symmetry and skewness
Graphical Summaries: Five-number summary, box-and-whisker plots
Relationships: Covariance and correlation

Measures of Central Tendency

Arithmetic Mean

The arithmetic mean (or simply, mean) is the most common measure of central tendency. It is calculated as the sum of all values divided by the number of values.

Population Mean:
Sample Mean:
Sensitivity: The mean is affected by extreme values (outliers).

Median

The median is the middle value in an ordered list (50% above, 50% below). It is not affected by outliers.

Median Position: value in ordered data
If n is odd, the median is the middle value; if even, it is the average of the two middle values.

Mode

The mode is the value that occurs most frequently in a dataset. It can be used for both numerical and categorical data and is not affected by outliers.

There may be no mode, one mode (unimodal), or multiple modes (bimodal, multimodal).

Choosing the Best Measure

Mean: Generally preferred, unless outliers are present.
Median: Preferred when data contain outliers or are skewed.
Mode: Useful for categorical data or when identifying the most common value.

Shape of a Distribution

The shape describes how data are distributed:

Symmetric: Mean = Median = Mode
Positively Skewed: Mean > Median > Mode
Negatively Skewed: Mean < Median < Mode

Measures of Relative Location

Percentiles and Quartiles

Percentiles: Divide ordered data into 100 equal parts. The p-th percentile is the value below which p% of observations fall.
Quartiles: Divide data into four equal segments.
- First quartile (Q1): 25% below
- Second quartile (Q2): 50% below (the median)
- Third quartile (Q3): 75% below
Quartile Position Formulas:

Five-Number Summary and Box-and-Whisker Plots

Five-Number Summary

Minimum
First Quartile (Q1)
Median (Q2)
Third Quartile (Q3)
Maximum

Order: Minimum < Q1 < Median < Q3 < Maximum

Box-and-Whisker Plot

Graphical representation of the five-number summary
Box shows Q1 to Q3 with a line at the median
Whiskers extend to minimum and maximum values

Measures of Variability

Range

Definition: Difference between the largest and smallest values
Limitation: Sensitive to outliers and ignores data distribution

Interquartile Range (IQR)

Definition: Spread of the middle 50% of data
Advantage: Reduces the effect of outliers

Variance and Standard Deviation

Variance: Average squared deviation from the mean
Population Variance:
Sample Variance:
Standard Deviation: Square root of variance; restores original units
- Population:
- Sample:

Coefficient of Variation (CV)

Definition: Measures variation relative to the mean (unitless, percentage)
Population:
Sample:
Use: Compare variability between datasets with different units or means

Empirical Rule and Chebyshev's Theorem

Empirical Rule (for bell-shaped distributions)

About 68% of data within 1 standard deviation of the mean
About 95% within 2 standard deviations
About 99.7% within 3 standard deviations

Chebyshev's Theorem (any distribution)

At least of data falls within k standard deviations of the mean (for k > 1)
For k = 2: at least 75% within 2 standard deviations
For k = 3: at least 89% within 3 standard deviations

z-Score

A z-score standardizes a value by expressing its distance from the mean in terms of standard deviations.

(population)
z > 0: value above mean; z < 0: value below mean; z = 0: value equals mean

Weighted Mean and Grouped Data

Weighted Mean

Used when data values have different weights

Grouped Data Approximations

For data grouped into classes, use class midpoints and frequencies to estimate mean and variance
Mean: , where is frequency and is midpoint
Variance:

Measures of Relationship: Covariance and Correlation

Covariance

Measures the direction of the linear relationship between two variables X and Y
Population:
Sample:
Cov(X,Y) > 0: positive relationship; Cov(X,Y) < 0: negative relationship; Cov(X,Y) = 0: no linear relationship

Correlation Coefficient (r)

Measures both the strength and direction of a linear relationship
Population:
Sample:
Range: -1 ≤ r ≤ 1
r close to 1: strong positive; r close to -1: strong negative; r close to 0: weak or no linear relationship

Tabular Example: Summary Statistics for Four Locations

The following table summarizes key statistics for four locations (from the boxplot example):

Location	Mean	Min	Q1	Median	Q3	Max	IQR	Range
1	10.1	6	8.0	10.5	12.5	14	4.5	8
2	13.6	8	10.75	13.5	16.75	19	6.0	11
3	17.5	11	15.0	17.5	20.5	25	5.5	14
4	12.5	8	10.5	12.0	15.0	18	4.5	10

Key Takeaways

Central tendency and variability are fundamental for summarizing data.
Relative location measures (percentiles, quartiles) help interpret individual values within a dataset.
Boxplots and five-number summaries provide visual and numerical summaries of data distribution.
Covariance and correlation quantify relationships between variables, essential for regression and prediction.