BackLecture 24: Estimating Parameters, Sampling Distributions, and the Central Limit Theorem
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Parameter Estimation and Sampling Distributions
Introduction to Parameters and Statistics
In statistics, we often seek to understand properties of a population by estimating unknown values called parameters. Since it is usually impractical to collect data from an entire population, we use statistics calculated from samples as estimates.
Parameter: A numerical value that describes a characteristic of a population (e.g., population mean μ, population proportion p). Parameters are typically unknown constants.
Statistic: A numerical value calculated from a sample, used to estimate a population parameter. Statistics are known but variable, as they change from sample to sample.
Example: The mean height of all adult males in the US (μ) is a parameter. The mean height from a sample of 100 men (\bar{x}) is a statistic.
Point Estimation and Margin of Error
Sample statistics are used as point estimates for unknown population parameters. However, because samples vary, so do their statistics. Quantifying this variability allows us to estimate the margin of error for our point estimates.
Margin of Error: The range within which the true population parameter is expected to lie, with a certain level of confidence (e.g., 95%).
Confidence Interval: An interval estimate that combines the point estimate and margin of error.
Example: In a survey, 41% ± 2.9% means we are 95% confident the true proportion is between 38.1% and 43.9%.
Sampling Distributions
Definition and Construction
A sampling distribution is the probability distribution of a statistic (such as the sample mean) computed from all possible samples of a given size from a population.
The sampling distribution of the sample mean \bar{X} is the distribution of all possible values of \bar{X} from samples of size n from a population with mean μ and standard deviation σ.
The shape, center, and spread of the sampling distribution depend on sample size and sampling design.
Procedure for Constructing a Sampling Distribution (Small N and n)
Specify the sample size n and sampling design (e.g., simple random sampling with or without replacement).
List all possible samples of size n and their probabilities.
Compute the statistic (e.g., sample mean) for each sample.
Determine the probability for each value of the statistic.
Example: Sampling Distribution of the Sample Mean
Suppose a population consists of three values: 0, 6, and 9, each with probability 1/3. For samples of size 2 (with replacement), the possible sample means and their probabilities are:
Sample Mean (\bar{X}) | Probability |
|---|---|
0 | 1/9 |
3 | 2/9 |
4.5 | 2/9 |
6 | 1/9 |
7.5 | 2/9 |
9 | 1/9 |
Additional info: This table illustrates how the sampling distribution of the mean is constructed from all possible samples.
Sampling Variability and Sample Size
Effect of Sample Size on Variability
The variability of sample statistics decreases as the sample size increases. Larger samples tend to produce sample means that are closer to the population mean.
Sampling Variability: The extent to which a statistic varies from sample to sample.
As n increases, the standard deviation of the sampling distribution of the mean decreases.
Example: Five samples of size 6 from a population may have means 70.3, 67.5, 71.5, 70.3, 68.8. Five samples of size 100 may have means 70.0, 69.4, 68.9, 69.1, 68.9, showing less variability.
Properties of the Sampling Mean
Unbiasedness and Standard Error
Unbiased Estimator: The expected value of the sample mean equals the population mean:
Standard Error (SE): The standard deviation of the sampling distribution of the mean:
Additional info: The standard error quantifies the precision of the sample mean as an estimate of the population mean.
Distribution of the Sample Mean
Normal Population
If the population is normally distributed, the sampling distribution of the sample mean is also normal, regardless of sample size.
Mean of sampling distribution:
Standard deviation (standard error):
Example: Weights of pennies: grams, grams. For , the standard error is .
Non-Normal Population and the Central Limit Theorem
When the population is not normal, the Central Limit Theorem (CLT) states that the sampling distribution of the sample mean becomes approximately normal as the sample size increases (typically ).
Regardless of population shape, for large , (approximately normal).
For small , the sampling distribution may retain the skewness of the population.
Example: Number of people in US households (skewed distribution): For , the sampling distribution is skewed; for , it is approximately normal.
Summary Table: Shape, Center, and Spread of Sampling Distribution
Population Type | Shape of Sampling Distribution | Center | Spread |
|---|---|---|---|
Normal (mean , SD ) | Normal (any ) | ||
Not normal (mean , SD ) | Approximately normal (for large ) |
Z-Transform of the Sampling Mean
Standardization
To calculate probabilities for the sample mean, we use the Z-transform to standardize:
Example: For a sample mean , population mean , and standard error , the Z-score allows us to use the standard normal table to find probabilities.
Worked Examples
Sampling Mean from a Normal Population
Given: , ,
Mean of sampling distribution:
Standard error:
Probability calculation: Use standard normal table to find probability.
Sampling Mean from an Unknown Population
Given: , ,
Mean of sampling distribution:
Standard error:
By CLT, is approximately normal.
Probability calculation: Use standard normal table to find probability.
Central Limit Theorem (CLT)
Statement and Application
The Central Limit Theorem is a fundamental result in statistics. It states that, regardless of the population's distribution, the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.
For large (),
Allows use of normal probability methods for inference about means, even when the population is not normal.
Example: Sampling means from a skewed population become approximately normal for .