Sampling and Sampling Distributions: Core Concepts and Applications

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Sampling and Sampling Distributions

7.1 Why Sample?

Sampling is a fundamental process in statistics, allowing researchers to draw conclusions about a population by examining a subset of its members. This approach is essential when it is impractical or impossible to measure the entire population.

Population: The entire group of subjects of interest.
Sample: A subset of the population, selected for analysis.
Purpose of Sampling: To make accurate inferences about the population while saving time and resources.

7.2 Types of Sampling and Biases

There are several methods for selecting samples, each with its own advantages and limitations. Proper sampling methods reduce bias and improve the reliability of statistical inferences.

Probability Sampling

Probability Sample: Each member of the population has a known, nonzero chance of being selected. Enables inferential statistics.

Simple Random Sampling

Every member has an equal chance of selection.
Often implemented using random number generators or software tools like Excel.

Excel random sampling dialog Excel random sampling dialog with input range Excel output of simple random sample

Systematic Sampling

Every kth member is chosen, where k = N/n (N = population size, n = sample size).
Easy to implement but may be affected by periodicity in the population.

Excel systematic sampling dialog Excel output of systematic sample

Stratified Sampling

Population is divided into strata (homogeneous groups), and random samples are drawn from each stratum.
Ensures representation of key subgroups.

Cluster Sampling

Population is divided into clusters (often geographically), and entire clusters are randomly selected.
Clusters are heterogeneous mini-populations.

Resampling

Statistical technique involving repeated sampling from the available data (e.g., bootstrap method).

Nonprobability Sampling

Probability of selection is unknown (e.g., convenience sampling).
Quick and easy but may not be representative.

Biases in Sampling

Sampling Bias: Sample is not representative of the population.
Nonresponse Bias: Differences between respondents and nonrespondents.
Response Bias: Inaccurate answers due to question wording or respondent behavior.
Undercoverage Bias: Some population segments are inadequately represented.
Voluntary Response Bias: Those who volunteer differ systematically from those who do not.
Cognitive Biases: Logical errors in reasoning (anchoring, availability heuristic, confirmation, recency).

7.3 Sampling and Nonsampling Errors

Errors can arise from both the sampling process and other aspects of data collection.

Parameter: Value describing a population characteristic (e.g., mean, median).
Statistic: Value calculated from a sample.
Sampling Error: Difference between a sample statistic and the corresponding population parameter.

Formula for Sampling Error of the Sample Mean:

Nonsampling Errors: Errors not related to sampling variability (e.g., ambiguous questions, data collection mistakes).

7.4 The Central Limit Theorem (CLT)

The Central Limit Theorem is a cornerstone of inferential statistics. It states that the distribution of sample means approaches normality as the sample size increases, regardless of the population's distribution.

For large samples (n ≥ 30), the sampling distribution of the mean is approximately normal.
If the population is normal, the sampling distribution is normal for any sample size.

Population Example: Entrée Prices

Entrée Type	Price ($)	Probability
Vegetarian (V)	12	0.25
Chicken (C)	14	0.25
Fish (F)	16	0.25
Steak (S)	18	0.25

Table of entrée types, prices, and probabilities Probability distribution for entrée choices

Population Mean Calculation:

Calculation of population mean

Population Standard Deviation Calculation:

Table for standard deviation calculation Standard deviation formula and calculation

Sampling Distribution Example

All possible pairs of entrée choices (n = 2) are listed, and their averages are calculated to form the sampling distribution of the mean.

Table of all possible entrée combinations Sample mean calculation example Table of sample means for entrée pairs Sampling distribution of the mean (bar chart)

Key Properties of the Sampling Distribution of the Mean

The mean of the sampling distribution equals the population mean:
The standard deviation of the sampling distribution (standard error):

Formula for standard error of the mean Standard error formula Standard error calculation example

Application: Testing Claims Using the CLT

To test claims about population means, calculate the probability of observing a sample mean as extreme as the one obtained, assuming the null hypothesis is true.

Calculate the standard error of the mean.
Compute the z-score for the observed sample mean.
Find the probability using the standard normal distribution.

Standard error calculation for drive time example Z-score calculation for drive time example Probability calculation for drive time example Normal curve with shaded region for probability

The Effect of Sample Size

As sample size increases, the standard error decreases, making the sampling distribution narrower and reducing sampling error.

Effect of sample size on sampling distribution Effect of population shape and sample size on sampling distribution

Finite Population Correction

When the sample size is more than 5% of the population, adjust the standard error using the finite population correction factor:

Finite population correction calculation Z-score calculation with finite population correction Probability calculation with finite population correction

7.5 The Sampling Distribution of the Proportion

When dealing with proportions, the sampling distribution describes the pattern of sample proportions from repeated random samples.

Underlying distribution is binomial.
Conditions: and (where ).

Sample Proportion Formula:

Standard Error of the Proportion:

Z-score for the Sample Proportion:

Example: Testing a Proportion Claim

Population: 770 students, claimed proportion .
Sample: 120 students, 97 successes ().
Check conditions: , .
Calculate standard error and z-score.
Find probability and draw conclusion.

Checking binomial conditions for sample proportion Sample proportion calculation Standard error calculation for sample proportion Z-score calculation for sample proportion Probability calculation for sample proportion

Conclusion

If the probability of observing the sample proportion (or more extreme) is very small under the null hypothesis, we have evidence to suggest the population proportion is different from the claimed value.

Additional info: These notes provide a comprehensive overview of sampling methods, errors, the Central Limit Theorem, and the sampling distributions of means and proportions, with practical examples and relevant formulas for business statistics students.