Chapter 3: Calculating Descriptive Statistics – Business Statistics Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 3: Calculating Descriptive Statistics

3.1 Measures of Central Tendency

Measures of central tendency are statistical values that describe the center point or typical value of a dataset. The main measures include the mean, median, mode, and weighted mean.

Mean

Definition: The mean (or average) is the sum of all values divided by the number of observations.
Sample Mean Formula:

Sample mean notation Sum notation for sample mean Sample size notation

The formula for the sample mean is:

Sample mean formula

Population Mean Formula:

Population mean formula

Example: For the sample values 87.2, 118.9, 76.2, 107.7, 61.5, the sample mean is:

Sample mean calculation example

Weighted Mean

Definition: The weighted mean assigns different weights to values, reflecting their relative importance.
Formula:

Weighted mean formula

Example: Suppose your statistics grade is based on an exam, project, and homework with different weights.

Weighted mean example table Weighted mean calculation table Weighted mean calculation sum of weights

The weighted mean is:

Weighted mean calculation formula

Median

Definition: The median is the middle value when data are arranged in ascending order. If the number of values is even, it is the average of the two middle values.
Calculation: Use the index point , rounding up if not a whole number.
Example: For n = 9, the median is the 5th value in the sorted list.

Median calculation example Median calculation example continued

Mode

Definition: The mode is the value that appears most frequently in a dataset. There can be more than one mode or none at all.
Example (Numerical Data):

Mode example with dress sizes

Example (Categorical Data):

Mode example with TV brands

Shapes of Frequency Distributions

Symmetric: Mean = Median
Right-Skewed: Mean > Median
Left-Skewed: Mean < Median

Symmetric distribution Right-skewed distribution Left-skewed distribution

Using Excel for Central Tendency

Excel functions: AVERAGE, MEDIAN, MODE.SNGL
Excel may not always identify all modes or handle categorical data for mode.

Excel calculation of mean, median, mode Excel Data Analysis tool Excel Descriptive Statistics output

Choosing the Appropriate Measure

Mean: Use when data are symmetric and without outliers.
Median: Use when data are skewed or contain outliers.
Mode: Use for categorical data.

Advantages and disadvantages of mean, median, mode

3.2 Measures of Variability

Measures of variability describe the spread or dispersion of data values. Common measures include range, variance, and standard deviation.

Range

Definition: The range is the difference between the highest and lowest values in a dataset.
Formula: Range = Highest value – Lowest value
Advantage: Simple to calculate.
Disadvantage: Only considers two values and is sensitive to outliers.

Variance and Standard Deviation

Variance: Measures the average squared deviation from the mean.
Sample Variance Formula:
Standard Deviation: The square root of the variance, with the same units as the original data.
Sample Standard Deviation Formula:

Sample variance calculation table Sample variance calculation formula Sample standard deviation calculation

Population Variance and Standard Deviation

Population Variance Formula:
Population Standard Deviation Formula:

Population variance formula Population variance calculation

Excel for Variability

Sample: =VAR.S(), =STDEV.S()
Population: =VAR.P(), =STDEV.P()

Excel Descriptive Statistics for variability

3.3 Using the Mean and Standard Deviation Together

The mean and standard deviation are often used together to describe the center and spread of data, especially in quality control and business applications.

Coefficient of Variation (CV)

Definition: The CV expresses the standard deviation as a percentage of the mean, allowing comparison of variability between datasets with different units or means.
Formula (Sample):
Formula (Population):
Example: Comparing Microsoft and Amazon stock price variability.

Stock price table for CV example CV calculation for Microsoft and Amazon

z-Score

Definition: The z-score indicates how many standard deviations a value is from the mean.
Formula (Sample):
Formula (Population):
Interpretation: z = 0 (at mean), z > 0 (above mean), z < 0 (below mean). Outliers typically have |z| > 3.
Example: Calculating z-score for a hamburger calorie value.

z-score calculation example Hamburger calorie table for z-score example

The Empirical Rule

For bell-shaped (normal) distributions:
~68% of values within ±1 standard deviation
~95% within ±2 standard deviations
~99.7% within ±3 standard deviations

Empirical rule 68% Empirical rule 95% Empirical rule 99.7%

3.4 Working with Grouped Data

When data are grouped into frequency distributions, the mean and variance can be estimated using class midpoints and frequencies.

Mean of Grouped Data

Formula (Sample):
Where: = frequency of class i, = midpoint of class i, = total observations, = number of classes
Example: Calculating mean age from grouped survey data.

Grouped data frequency table Grouped data midpoints table Grouped data mean calculation

3.5 Measures of Relative Position

These measures compare the position of a value relative to the rest of the data, including percentiles, quartiles, and the interquartile range (IQR).

Percentiles

Definition: The pth percentile is the value below which p% of the data fall.
Calculation: Sort data, compute index .

Quartiles

Q1: 25th percentile
Q2: 50th percentile (median)
Q3: 75th percentile

Interquartile Range (IQR)

Definition: IQR = Q3 – Q1; describes the spread of the middle 50% of data.

Box-and-Whisker Plots

Graphical summary showing quartiles, minimum, maximum, and outliers.

Outliers

Values outside Q1 – 1.5(IQR) or Q3 + 1.5(IQR) are considered outliers.

3.6 Measures of Association Between Two Variables

These statistics describe the relationship between two variables, including covariance and correlation.

Sample Covariance

Definition: Measures the direction of the linear relationship between two variables.
Formula:

Sample Correlation Coefficient

Definition: Measures both the strength and direction of the linear relationship between two variables.
Formula:
Range: -1 (perfect negative) to +1 (perfect positive); 0 means no linear relationship.

Additional info: These notes cover all major descriptive statistics relevant to business statistics, including formulas, examples, and Excel applications. For further study, refer to the textbook for more detailed examples and practice problems.