Skip to main content
Back

Describing Data: Numerical Measures – Study Notes for Statistics for Business

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Describing Data: Numerical Measures

Chapter Overview

This chapter introduces essential numerical measures used to summarize and describe data in business statistics. Understanding these measures allows for effective data analysis, comparison, and interpretation, which are crucial for informed business decision-making.

  • Central Tendency: Mean, median, and mode

  • Variation: Range, variance, standard deviation, coefficient of variation

  • Relative Location: Percentiles, quartiles

  • Distribution Shape: Symmetry and skewness

  • Graphical Summaries: Five-number summary, box-and-whisker plots

  • Relationships: Covariance and correlation

Measures of Central Tendency

Arithmetic Mean

The arithmetic mean (or simply, mean) is the most common measure of central tendency. It is calculated as the sum of all values divided by the number of values.

  • Population Mean:

  • Sample Mean:

  • Sensitivity: The mean is affected by extreme values (outliers).

Median

The median is the middle value in an ordered list (50% above, 50% below). It is not affected by outliers.

  • Median Position: value in ordered data

  • If n is odd, the median is the middle value; if even, it is the average of the two middle values.

Mode

The mode is the value that occurs most frequently in a dataset. It can be used for both numerical and categorical data and is not affected by outliers.

  • There may be no mode, one mode (unimodal), or multiple modes (bimodal, multimodal).

Choosing the Best Measure

  • Mean: Generally preferred, unless outliers are present.

  • Median: Preferred when data contain outliers or are skewed.

  • Mode: Useful for categorical data or when identifying the most common value.

Shape of a Distribution

The shape describes how data are distributed:

  • Symmetric: Mean = Median = Mode

  • Positively Skewed: Mean > Median > Mode

  • Negatively Skewed: Mean < Median < Mode

Measures of Relative Location

Percentiles and Quartiles

  • Percentiles: Divide ordered data into 100 equal parts. The p-th percentile is the value below which p% of observations fall.

  • Quartiles: Divide data into four equal segments.

    • First quartile (Q1): 25% below

    • Second quartile (Q2): 50% below (the median)

    • Third quartile (Q3): 75% below

  • Quartile Position Formulas:

Five-Number Summary and Box-and-Whisker Plots

Five-Number Summary

  • Minimum

  • First Quartile (Q1)

  • Median (Q2)

  • Third Quartile (Q3)

  • Maximum

Order: Minimum < Q1 < Median < Q3 < Maximum

Box-and-Whisker Plot

  • Graphical representation of the five-number summary

  • Box shows Q1 to Q3 with a line at the median

  • Whiskers extend to minimum and maximum values

Measures of Variability

Range

  • Definition: Difference between the largest and smallest values

  • Limitation: Sensitive to outliers and ignores data distribution

Interquartile Range (IQR)

  • Definition: Spread of the middle 50% of data

  • Advantage: Reduces the effect of outliers

Variance and Standard Deviation

  • Variance: Average squared deviation from the mean

  • Population Variance:

  • Sample Variance:

  • Standard Deviation: Square root of variance; restores original units

    • Population:

    • Sample:

Coefficient of Variation (CV)

  • Definition: Measures variation relative to the mean (unitless, percentage)

  • Population:

  • Sample:

  • Use: Compare variability between datasets with different units or means

Empirical Rule and Chebyshev's Theorem

Empirical Rule (for bell-shaped distributions)

  • About 68% of data within 1 standard deviation of the mean

  • About 95% within 2 standard deviations

  • About 99.7% within 3 standard deviations

Chebyshev's Theorem (any distribution)

  • At least of data falls within k standard deviations of the mean (for k > 1)

  • For k = 2: at least 75% within 2 standard deviations

  • For k = 3: at least 89% within 3 standard deviations

z-Score

A z-score standardizes a value by expressing its distance from the mean in terms of standard deviations.

  • (population)

  • z > 0: value above mean; z < 0: value below mean; z = 0: value equals mean

Weighted Mean and Grouped Data

Weighted Mean

  • Used when data values have different weights

Grouped Data Approximations

  • For data grouped into classes, use class midpoints and frequencies to estimate mean and variance

  • Mean: , where is frequency and is midpoint

  • Variance:

Measures of Relationship: Covariance and Correlation

Covariance

  • Measures the direction of the linear relationship between two variables X and Y

  • Population:

  • Sample:

  • Cov(X,Y) > 0: positive relationship; Cov(X,Y) < 0: negative relationship; Cov(X,Y) = 0: no linear relationship

Correlation Coefficient (r)

  • Measures both the strength and direction of a linear relationship

  • Population:

  • Sample:

  • Range: -1 ≤ r ≤ 1

  • r close to 1: strong positive; r close to -1: strong negative; r close to 0: weak or no linear relationship

Tabular Example: Summary Statistics for Four Locations

The following table summarizes key statistics for four locations (from the boxplot example):

Location

Mean

Min

Q1

Median

Q3

Max

IQR

Range

1

10.1

6

8.0

10.5

12.5

14

4.5

8

2

13.6

8

10.75

13.5

16.75

19

6.0

11

3

17.5

11

15.0

17.5

20.5

25

5.5

14

4

12.5

8

10.5

12.0

15.0

18

4.5

10

Key Takeaways

  • Central tendency and variability are fundamental for summarizing data.

  • Relative location measures (percentiles, quartiles) help interpret individual values within a dataset.

  • Boxplots and five-number summaries provide visual and numerical summaries of data distribution.

  • Covariance and correlation quantify relationships between variables, essential for regression and prediction.

Pearson Logo

Study Prep