BackSampling and Sampling Distributions – Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Sampling and Sampling Distributions
7.1 Why Sample?
Sampling is a fundamental concept in statistics, especially in business contexts where measuring an entire population is often impractical. Instead, a subset (sample) is studied to make inferences about the whole group (population).
Population: All possible subjects of interest in a study.
Sample: A subset of the population, selected for analysis.
Why Sample?
Measuring an entire population can be expensive or impossible.
A properly selected sample allows for accurate assessment of the population.
7.2 Types of Sampling and Biases
Sampling methods determine how representative and reliable the results are. There are two main categories: probability and nonprobability sampling.
Probability Sampling
Probability Sample: Each member of the population has a known, nonzero chance of being selected.
Advantage: Enables inferential statistical tests for reliable conclusions about the population.
Simple Random Sampling
Every member of the population has an equal chance of being chosen.
Example: Selecting 10 students at random from a list of 1,800 using Excel's Data Analysis tool.
Without Replacement: Once selected, a member cannot be chosen again.
Systematic Sampling
Every kth member is chosen, where k is the population size divided by the sample size.
Formula for Systematic Sampling Constant:
Example: For a population of 1,800 and a sample of 10, (choose every 180th student).
Advantages: Easy to implement, reduces judgment bias.
Disadvantage: Risk of periodicity bias if there is a pattern in the population matching k.
Stratified Sampling
Population is divided into mutually exclusive groups (strata), and random samples are taken from each.
Homogeneity within strata, heterogeneity between strata.
Strata are based on important variables (e.g., age, income).
Cluster Sampling
Randomly select clusters (often based on geography), then sample all or some members within clusters.
Clusters are mini-populations, often heterogeneous within but similar to the overall population.
Examples: Classrooms, test-market cities.
Resampling
Statistical technique where many samples are repeatedly drawn from an available population.
Bootstrap Method: Uses computer software to extract many samples with replacement to estimate parameters (mean, proportion).
Nonprobability Sampling
Probability of selection is unknown.
Convenience Sample: Members are chosen because they are easily accessible.
Advantages: Quick, easy, provides general information.
Disadvantages: May not be representative.
Biases in Sampling
Biases are systematic errors that can affect the validity of results.
Type | Description |
|---|---|
Sampling Bias | Sample is not representative of the population. |
Nonresponse Bias | Individuals who do not respond differ from those who do. |
Response Bias | Respondents provide inaccurate answers (e.g., due to leading questions). |
Undercoverage Bias | Certain portions of the population are insufficiently represented. |
Voluntary Response Bias | Those who volunteer differ systematically from those who do not. |
Cognitive Biases | Logical errors in reasoning (e.g., anchoring, availability heuristic, confirmation, recency). |
7.3 Sampling and Nonsampling Errors
Errors can arise from both the sampling process and other aspects of data collection.
Parameter: Value describing a population characteristic (e.g., mean, median).
Statistic: Value calculated from a sample.
Sampling Error: Difference between a sample statistic and the population parameter.
Formula for Sampling Error of the Sample Mean:
\[ \text{Sampling Error} = \overline{x} - \mu \]
Larger sample sizes reduce average sampling error.
Nonsampling Errors: Arise from ambiguous questions, leading questions, or data collection mistakes. Not related to sampling variability.
7.4 The Central Limit Theorem (CLT)
The Central Limit Theorem is a cornerstone of inferential statistics, stating that the distribution of sample means approaches normality as sample size increases, regardless of the population's distribution.
For large samples (usually ), the sampling distribution of the mean is approximately normal.
If the population is normal, the sampling distribution is normal for any sample size.
The mean of the sampling distribution equals the population mean ().
The standard deviation of the sampling distribution (standard error) is:
\[ \sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}} \]
As sample size increases, the standard error decreases, making estimates more precise.
If sampling from a finite population (where ), use the finite population correction:
\[ \sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}} \]
Application Example: Testing claims about means (e.g., average drive time) using the CLT and calculating probabilities with z-scores.
7.5 The Sampling Distribution of the Proportion
When dealing with proportions (e.g., percentage of successes), the sampling distribution describes the pattern of sample proportions from repeated samples.
Underlying distribution is binomial.
Conditions: and (where ).
Sample Proportion Formula:
\[ \hat{p} = \frac{x}{n} \]
Standard Error of the Proportion:
\[ \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \]
Z-score for the Sample Proportion:
\[ z = \frac{\hat{p} - p}{\sigma_{\hat{p}}} \]
Example: Testing a college's claim that 70% of graduates have jobs related to their majors using a sample of 120 students.
Summary Table: Types of Sampling Methods
Sampling Method | Description | Key Feature |
|---|---|---|
Simple Random | Every member has equal chance | Random selection |
Systematic | Every kth member selected | Uses interval k |
Stratified | Population divided into strata, sample from each | Homogeneity within strata |
Cluster | Randomly select clusters, sample within | Clusters are mini-populations |
Convenience | Sample easily accessible members | Nonprobability |
Key Takeaways:
Sampling allows for efficient and practical data collection.
Probability sampling methods support valid statistical inference.
Biases and errors must be minimized for reliable results.
The Central Limit Theorem justifies the use of normal probability models for sample means and proportions.