BackNumerical Descriptive Measures in Business Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Numerical Descriptive Measures
Introduction
Numerical descriptive measures are essential tools in business statistics for summarizing and understanding data. They help describe the properties of central tendency, variation, and shape in numerical variables, and are foundational for data analysis and interpretation.
Measures of Central Tendency
The Mean
The mean, or arithmetic mean, is the most common measure of central tendency. It represents the average value in a data set and is calculated by dividing the sum of all values by the number of values.
Definition: The mean is the sum of observed values divided by the sample size.
Formula:
Sensitivity: The mean is affected by extreme values (outliers).
Example: For the data set {11, 12, 13, 14, 20}, the mean is .
The Median
The median is the middle value in an ordered array, with 50% of values above and 50% below. It is less sensitive to outliers than the mean.
Definition: The median is the value separating the higher half from the lower half of a data sample.
Locating the Median:
If the number of values is odd, the median is the middle value.
If even, the median is the average of the two middle values.
Example: For the data set {12, 13, 14, 15, 16}, the median is 14.
The Mode
The mode is the value that occurs most frequently in a data set. It can be used for both numerical and categorical data and is not affected by outliers.
Definition: The mode is the most frequently observed value.
There may be no mode or multiple modes in a data set.
Example: In the data set {100, 200, 100, 300, 400}, the mode is 100.
Choosing the Appropriate Measure
The mean is generally used unless outliers are present.
The median is preferred when data contain outliers.
Reporting both mean and median can provide a more complete picture.
Geometric Mean and Geometric Mean Rate of Return
The geometric mean is used to measure the rate of change of a variable over time, especially for growth rates and investment returns.
Formula:
Geometric Mean Rate of Return:
Example: An investment of $100,000 declines to $50,000 in year one and rebounds to $100,000 in year two. The overall two-year return is zero.
Measures of Variation
The Range
The range is the simplest measure of variation, calculated as the difference between the largest and smallest values.
Formula:
The range is sensitive to outliers and does not account for data distribution.
Sample Variance
The variance measures the average squared deviation of values from the mean.
Formula:
Variance is expressed in squared units of the original data.
Sample Standard Deviation
The standard deviation is the square root of the variance and is the most commonly used measure of variation.
Formula:
Standard deviation has the same units as the original data.
Smaller standard deviation indicates data are more concentrated around the mean.
Coefficient of Variation (CV)
The coefficient of variation measures relative variation and is always expressed as a percentage. It allows comparison of variability between data sets with different units.
Formula:
Example: Stock A has a mean price of CV = \frac{5}{50} \times 100\% = 10\%$.
Z-Score
The Z-score indicates how many standard deviations a data value is from the mean. It is used to identify outliers.
Formula:
A Z-score less than -3.0 or greater than +3.0 is considered an extreme outlier.
Example: If the mean SAT score is 490 and the standard deviation is 100, a score of 620 has .
Shape of a Distribution
Skewness
Skewness measures the extent to which data values are not symmetrical.
Left-skewed: Mean < Median
Symmetric: Mean = Median
Right-skewed: Median < Mean
Kurtosis
Kurtosis measures the peakedness of the distribution curve.
Leptokurtic: Sharper peak
Mesokurtic: Bell-shaped (normal)
Platykurtic: Flatter peak
Quartiles and the Five-Number Summary
Quartiles
Quartiles divide ranked data into four equal segments. The first quartile (Q1) is the value below which 25% of the data fall, the second quartile (Q2) is the median, and the third quartile (Q3) is the value below which 75% of the data fall.
Locating Quartiles:
If the position is a whole number, use the ranked value. If fractional, average the two corresponding values.
Interquartile Range (IQR)
The interquartile range (IQR) measures the spread of the middle 50% of the data and is resistant to outliers.
Formula:
Example: If and , then .
Five-Number Summary
The five-number summary consists of:
Minimum ()
First Quartile ()
Median ()
Third Quartile ()
Maximum ()
Boxplot
A boxplot is a graphical display based on the five-number summary. It visually shows the center, spread, and shape of the data, and can indicate skewness.
If the box and central line are centered, the data are symmetric.
Boxplots can be vertical or horizontal.
Descriptive Measures for a Population
Population Mean
The population mean is the sum of all values in the population divided by the population size.
Formula:
Population Variance and Standard Deviation
Population Variance:
Population Standard Deviation:
Sample Statistics vs. Population Parameters
Sample statistics (mean, variance, standard deviation) describe a sample.
Population parameters (denoted by Greek letters) describe the entire population.
Empirical Rule and Chebyshev's Rule
Empirical Rule
The empirical rule applies to symmetric, mound-shaped distributions (normal distributions):
Approximately 68% of data within 1 standard deviation of the mean.
Approximately 95% within 2 standard deviations.
Approximately 99.7% within 3 standard deviations.
Chebyshev's Rule
Chebyshev's rule applies to any data distribution:
At least of values fall within standard deviations of the mean, for .
Example: For , at least 75% of values are within 2 standard deviations.
Measures of Relationship Between Two Numerical Variables
Covariance
Covariance measures the strength and direction of the linear relationship between two variables.
Formula:
If covariance > 0, variables move in the same direction.
If covariance < 0, variables move in opposite directions.
If covariance = 0, variables are independent.
Magnitude does not indicate strength of relationship.
Coefficient of Correlation
The coefficient of correlation (sample: r, population: ρ) measures the relative strength of the linear relationship between two variables.
Formula:
Unit-free and ranges from -1 to 1.
Closer to -1: stronger negative linear relationship.
Closer to 1: stronger positive linear relationship.
Closer to 0: weaker linear relationship.
Ethical Considerations in Data Analysis
Numerical descriptive measures should be reported objectively and fairly. Both positive and negative results must be documented, and inappropriate summary measures should not be used to distort facts.
Summary Table: Measures of Central Tendency
Measure | Definition | Sensitivity to Outliers | Example |
|---|---|---|---|
Mean | Average value | High | {11, 12, 13, 14, 20} → 14 |
Median | Middle value | Low | {12, 13, 14, 15, 16} → 14 |
Mode | Most frequent value | None | {100, 200, 100, 300, 400} → 100 |
Summary Table: Measures of Variation
Measure | Definition | Formula |
|---|---|---|
Range | Difference between largest and smallest values | |
Variance | Average squared deviation from mean | |
Standard Deviation | Square root of variance | |
Coefficient of Variation | Relative variation (percentage) |
Additional info: Some formulas and examples have been expanded for clarity and completeness. Tables have been inferred and constructed to summarize key measures.