Sampling Distributions (Chapter 7): Business Statistics Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Sampling Distributions

Introduction to Sampling Distributions

Sampling distributions are a foundational concept in inferential statistics, especially in business applications. They describe the probability distribution of a given statistic based on a random sample drawn from a population.

Sampling Distribution: The distribution of all possible values of a statistic (such as the mean) for a given sample size selected from a population.
Importance: Understanding sampling distributions allows us to make probabilistic statements about sample statistics and to estimate population parameters.

Developing a Sampling Distribution

Population and Sample Setup

Consider a population of size N = 4 with individuals A, B, C, and D.
The random variable X represents the age of individuals: 18, 20, 22, 24 years.

Population Summary Measures

Population Mean (μ):
Population Standard Deviation (σ):
The population distribution is uniform in this example.

All Possible Samples of Size n = 2 (with Replacement)

There are 16 possible samples (since each of the 4 individuals can be selected twice).
Each sample mean is calculated, resulting in a distribution of 16 sample means.

Sampling Distribution of the Sample Means

The distribution of the 16 sample means is not uniform, even though the population is.
This illustrates how the sampling distribution can differ in shape from the population distribution.

Summary Measures for the Sampling Distribution

Mean of Sample Means (μ\bar{X}):
Standard Deviation of Sample Means (σ\bar{X}):

Sampling Distribution of the Sample Mean

Key Properties

The mean of the sampling distribution of the sample mean is denoted by .
The standard deviation of the sampling distribution of the sample mean is called the standard error and is denoted by .

If the Population is Normal

If the population is normal with mean and standard deviation , then the sampling distribution of is also normal with:

Sampling Distribution Properties

The standard error decreases as the sample size increases.
Larger sample sizes yield a narrower (less variable) sampling distribution.

If the Population is Not Normal: The Central Limit Theorem (CLT)

The Central Limit Theorem states that, for large enough sample sizes, the sampling distribution of the sample mean will be approximately normal, regardless of the population's shape.
This allows us to use normal probability methods even when the population distribution is unknown or not normal.
The CLT applies specifically to the distribution of the sample mean, not to all statistics (e.g., sample variance).

Sampling Distribution of the Sample Variance

The sampling distribution of the sample variance is skewed to the right for all sample sizes.
The Central Limit Theorem does not apply to the sampling distribution of the sample variance.

How Large is Large Enough?

For most distributions, n > 30 is sufficient for the sampling distribution of the mean to be nearly normal.
For fairly symmetric distributions, n > 15 may be sufficient.
For normal populations, the sampling distribution of the mean is always normal, regardless of sample size.

Example: Probability Calculation Using the Sampling Distribution

Population mean , standard deviation .
Sample size .
What is ?

Step 1: Identify the sampling distribution The sampling distribution of the sample mean is approximately normal (by CLT).

Step 2: Calculate the mean and standard error

Step 3: Find the probability Probability of observing a value less than 105: NORMDIST(105, 100, 3, 1) = 0.95221 Probability of observing a value greater than 105:

Interpretation: There is a 4.8% chance that the sample mean will be greater than 105.

Summary Table: Key Formulas

Statistic	Population	Sampling Distribution
Mean
Standard Deviation

Additional info: The Central Limit Theorem is one of the most important results in statistics, as it justifies the use of normal probability models for inference about means, even when the population distribution is unknown or non-normal, provided the sample size is sufficiently large.