Sampling Distributions: Concepts, Properties, and Applications

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Sampling Distributions

8.1 Data, Probability, and Sampling Distributions

Sampling distributions are fundamental in statistics for making inferences about populations based on sample data. This section introduces the concept of sampling distributions, distinguishes them from data distributions, and explains their role in statistical inference.

Data Distribution: The distribution of the observed values in a single sample.
Probability Distribution: A theoretical model describing how a variable behaves in the population.
Sampling Distribution: The probability distribution of a statistic computed from all possible samples of a fixed size.
Statistics versus Parameters:
- Parameter: A summary number describing a population (e.g., population mean μ, population proportion p).
- Statistic: A summary number describing a sample (e.g., sample mean 𝑥̄, sample proportion p̂).

Example: Estimating the proportion of US adults with a certain characteristic by sampling and calculating the sample proportion.

8.2 Sampling Distribution of the Sample Proportion

When outcomes are categorical, the sample proportion is a key statistic. The sampling distribution of the sample proportion describes how this statistic varies across repeated samples.

Expected Value: The mean of the sampling distribution of p̂ equals the population proportion p.
Standard Deviation: For a sample size n, the standard deviation of p̂ is:
Normal Approximation: For large n (typically when and ), the sampling distribution of p̂ is approximately normal.

Example: Prevalence of blood type O in a population, estimating the sample proportion and its distribution.

Keyword	Definition
Population proportion	The true fraction of individuals in the population with a certain characteristic.
Sample proportion	The fraction of sampled individuals with the characteristic, an estimator of p.
Expected value of p̂	The mean of the sampling distribution of p̂, equal to p.
Standard deviation	The standard deviation of the sampling distribution of p̂,
Normal approximation	For large n, the sampling distribution of p̂ is approximately normal.

8.3 Sampling Distribution of the Sample Mean

The sample mean is a key statistic for quantitative variables. Its sampling distribution describes how the sample mean varies across repeated samples.

Mean: The mean of the sampling distribution of 𝑥̄ is equal to the population mean μ.
Standard Deviation: The standard deviation of 𝑥̄ is:
Central Limit Theorem (CLT): For sufficiently large n, the sampling distribution of 𝑥̄ is approximately normal, regardless of the population's distribution.

Example: Simulated enzyme activities, showing how the distribution of the sample mean becomes more normal as sample size increases.

Keyword	Definition
Standard error	The standard deviation of the sampling distribution of 𝑥̄,
Central Limit Theorem (CLT)	States that for large n, the sampling distribution of the sample mean is approximately normal with mean μ and standard deviation

8.4 Bootstrap Sampling Distribution

Bootstrapping is a resampling method used to approximate the sampling distribution of a statistic by repeatedly sampling with replacement from the observed data.

Steps in Bootstrapping:
1. Draw a bootstrap sample of size n with replacement from the original data.
2. Compute the statistic of interest for the bootstrap sample.
3. Repeat steps 1 and 2 many times to form the bootstrap sampling distribution.
Applications: Useful for complex statistics, small data sets, and when standard assumptions do not hold.

Example: Estimating the sampling distribution of the median tumor size using bootstrap samples.

Keyword	Definition
Bootstrapping	A resampling method that uses the observed data to approximate a sampling distribution.
Bootstrap sample	A sample of size n drawn with replacement from the original sample.
Bootstrap statistic	The value of the statistic computed in a bootstrap sample.
Bootstrap distribution	The distribution of many bootstrap statistics, an empirical approximation to the sampling distribution.

Additional info:

Examples and illustrations use simulated data and statistical software (JMP Pro 17) to demonstrate concepts.
Practical considerations include the adequacy of sample size for normal approximation and the use of bootstrapping when standard conditions are not met.