Numerical Descriptive Measures in Business Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Numerical Descriptive Measures

Introduction

Numerical descriptive measures are essential tools in business statistics for summarizing and understanding data. They help describe the properties of central tendency, variation, and shape in numerical variables, and are foundational for data analysis and interpretation.

Measures of Central Tendency

The Mean

The mean, or arithmetic mean, is the most common measure of central tendency. It represents the average value in a data set and is calculated by dividing the sum of all values by the number of values.

Definition: The mean is the sum of observed values divided by the sample size.
Formula:

Sensitivity: The mean is affected by extreme values (outliers).
Example: For the data set {11, 12, 13, 14, 20}, the mean is .

The Median

The median is the middle value in an ordered array, with 50% of values above and 50% below. It is less sensitive to outliers than the mean.

Definition: The median is the value separating the higher half from the lower half of a data sample.
Locating the Median:

If the number of values is odd, the median is the middle value.
If even, the median is the average of the two middle values.
Example: For the data set {12, 13, 14, 15, 16}, the median is 14.

The Mode

The mode is the value that occurs most frequently in a data set. It can be used for both numerical and categorical data and is not affected by outliers.

Definition: The mode is the most frequently observed value.
There may be no mode or multiple modes in a data set.
Example: In the data set {100, 200, 100, 300, 400}, the mode is 100.

Choosing the Appropriate Measure

The mean is generally used unless outliers are present.
The median is preferred when data contain outliers.
Reporting both mean and median can provide a more complete picture.

Geometric Mean and Geometric Mean Rate of Return

The geometric mean is used to measure the rate of change of a variable over time, especially for growth rates and investment returns.

Formula:

Geometric Mean Rate of Return:

Example: An investment of $100,000 declines to $50,000 in year one and rebounds to $100,000 in year two. The overall two-year return is zero.

Measures of Variation

The Range

The range is the simplest measure of variation, calculated as the difference between the largest and smallest values.

Formula:

The range is sensitive to outliers and does not account for data distribution.

Sample Variance

The variance measures the average squared deviation of values from the mean.

Formula:

Variance is expressed in squared units of the original data.

Sample Standard Deviation

The standard deviation is the square root of the variance and is the most commonly used measure of variation.

Formula:

Standard deviation has the same units as the original data.
Smaller standard deviation indicates data are more concentrated around the mean.

Coefficient of Variation (CV)

The coefficient of variation measures relative variation and is always expressed as a percentage. It allows comparison of variability between data sets with different units.

Formula:

Example: Stock A has a mean price of CV = \frac{5}{50} \times 100\% = 10\%$.

Z-Score

The Z-score indicates how many standard deviations a data value is from the mean. It is used to identify outliers.

Formula:

A Z-score less than -3.0 or greater than +3.0 is considered an extreme outlier.
Example: If the mean SAT score is 490 and the standard deviation is 100, a score of 620 has .

Shape of a Distribution

Skewness

Skewness measures the extent to which data values are not symmetrical.

Left-skewed: Mean < Median
Symmetric: Mean = Median
Right-skewed: Median < Mean

Kurtosis

Kurtosis measures the peakedness of the distribution curve.

Leptokurtic: Sharper peak
Mesokurtic: Bell-shaped (normal)
Platykurtic: Flatter peak

Quartiles and the Five-Number Summary

Quartiles

Quartiles divide ranked data into four equal segments. The first quartile (Q1) is the value below which 25% of the data fall, the second quartile (Q2) is the median, and the third quartile (Q3) is the value below which 75% of the data fall.

Locating Quartiles:

If the position is a whole number, use the ranked value. If fractional, average the two corresponding values.

Interquartile Range (IQR)

The interquartile range (IQR) measures the spread of the middle 50% of the data and is resistant to outliers.

Formula:

Example: If and , then .

Five-Number Summary

The five-number summary consists of:

Minimum ()
First Quartile ()
Median ()
Third Quartile ()
Maximum ()

Boxplot

A boxplot is a graphical display based on the five-number summary. It visually shows the center, spread, and shape of the data, and can indicate skewness.

If the box and central line are centered, the data are symmetric.
Boxplots can be vertical or horizontal.

Descriptive Measures for a Population

Population Mean

The population mean is the sum of all values in the population divided by the population size.

Formula:

Population Variance and Standard Deviation

Population Variance:

Population Standard Deviation:

Sample Statistics vs. Population Parameters

Sample statistics (mean, variance, standard deviation) describe a sample.
Population parameters (denoted by Greek letters) describe the entire population.

Empirical Rule and Chebyshev's Rule

Empirical Rule

The empirical rule applies to symmetric, mound-shaped distributions (normal distributions):

Approximately 68% of data within 1 standard deviation of the mean.
Approximately 95% within 2 standard deviations.
Approximately 99.7% within 3 standard deviations.

Chebyshev's Rule

Chebyshev's rule applies to any data distribution:

At least of values fall within standard deviations of the mean, for .
Example: For , at least 75% of values are within 2 standard deviations.

Measures of Relationship Between Two Numerical Variables

Covariance

Covariance measures the strength and direction of the linear relationship between two variables.

Formula:

If covariance > 0, variables move in the same direction.
If covariance < 0, variables move in opposite directions.
If covariance = 0, variables are independent.
Magnitude does not indicate strength of relationship.

Coefficient of Correlation

The coefficient of correlation (sample: r, population: ρ) measures the relative strength of the linear relationship between two variables.

Formula:

Unit-free and ranges from -1 to 1.
Closer to -1: stronger negative linear relationship.
Closer to 1: stronger positive linear relationship.
Closer to 0: weaker linear relationship.

Ethical Considerations in Data Analysis

Numerical descriptive measures should be reported objectively and fairly. Both positive and negative results must be documented, and inappropriate summary measures should not be used to distort facts.

Summary Table: Measures of Central Tendency

Measure	Definition	Sensitivity to Outliers	Example
Mean	Average value	High	{11, 12, 13, 14, 20} → 14
Median	Middle value	Low	{12, 13, 14, 15, 16} → 14
Mode	Most frequent value	None	{100, 200, 100, 300, 400} → 100

Summary Table: Measures of Variation

Measure	Definition	Formula
Range	Difference between largest and smallest values
Variance	Average squared deviation from mean
Standard Deviation	Square root of variance
Coefficient of Variation	Relative variation (percentage)

Additional info: Some formulas and examples have been expanded for clarity and completeness. Tables have been inferred and constructed to summarize key measures.