Skip to main content
Back

Sampling and Sampling Distributions – Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Sampling and Sampling Distributions

7.1 Why Sample?

Sampling is a fundamental concept in statistics, especially in business contexts where measuring an entire population is often impractical. Instead, a subset (sample) is studied to make inferences about the whole group (population).

  • Population: All possible subjects of interest in a study.

  • Sample: A subset of the population, selected for analysis.

  • Why Sample?

    • Measuring an entire population can be expensive or impossible.

    • A properly selected sample allows for accurate assessment of the population.

7.2 Types of Sampling and Biases

Sampling methods determine how representative and reliable the results are. There are two main categories: probability and nonprobability sampling.

Probability Sampling

  • Probability Sample: Each member of the population has a known, nonzero chance of being selected.

  • Advantage: Enables inferential statistical tests for reliable conclusions about the population.

Simple Random Sampling

  • Every member of the population has an equal chance of being chosen.

  • Example: Selecting 10 students at random from a list of 1,800 using Excel's Data Analysis tool.

  • Without Replacement: Once selected, a member cannot be chosen again.

Systematic Sampling

  • Every kth member is chosen, where k is the population size divided by the sample size.

Formula for Systematic Sampling Constant:

  • Example: For a population of 1,800 and a sample of 10, (choose every 180th student).

  • Advantages: Easy to implement, reduces judgment bias.

  • Disadvantage: Risk of periodicity bias if there is a pattern in the population matching k.

Stratified Sampling

  • Population is divided into mutually exclusive groups (strata), and random samples are taken from each.

  • Homogeneity within strata, heterogeneity between strata.

  • Strata are based on important variables (e.g., age, income).

Cluster Sampling

  • Randomly select clusters (often based on geography), then sample all or some members within clusters.

  • Clusters are mini-populations, often heterogeneous within but similar to the overall population.

  • Examples: Classrooms, test-market cities.

Resampling

  • Statistical technique where many samples are repeatedly drawn from an available population.

  • Bootstrap Method: Uses computer software to extract many samples with replacement to estimate parameters (mean, proportion).

Nonprobability Sampling

  • Probability of selection is unknown.

  • Convenience Sample: Members are chosen because they are easily accessible.

  • Advantages: Quick, easy, provides general information.

  • Disadvantages: May not be representative.

Biases in Sampling

Biases are systematic errors that can affect the validity of results.

Type

Description

Sampling Bias

Sample is not representative of the population.

Nonresponse Bias

Individuals who do not respond differ from those who do.

Response Bias

Respondents provide inaccurate answers (e.g., due to leading questions).

Undercoverage Bias

Certain portions of the population are insufficiently represented.

Voluntary Response Bias

Those who volunteer differ systematically from those who do not.

Cognitive Biases

Logical errors in reasoning (e.g., anchoring, availability heuristic, confirmation, recency).

7.3 Sampling and Nonsampling Errors

Errors can arise from both the sampling process and other aspects of data collection.

  • Parameter: Value describing a population characteristic (e.g., mean, median).

  • Statistic: Value calculated from a sample.

  • Sampling Error: Difference between a sample statistic and the population parameter.

Formula for Sampling Error of the Sample Mean:

\[ \text{Sampling Error} = \overline{x} - \mu \]

  • Larger sample sizes reduce average sampling error.

  • Nonsampling Errors: Arise from ambiguous questions, leading questions, or data collection mistakes. Not related to sampling variability.

7.4 The Central Limit Theorem (CLT)

The Central Limit Theorem is a cornerstone of inferential statistics, stating that the distribution of sample means approaches normality as sample size increases, regardless of the population's distribution.

  • For large samples (usually ), the sampling distribution of the mean is approximately normal.

  • If the population is normal, the sampling distribution is normal for any sample size.

  • The mean of the sampling distribution equals the population mean ().

  • The standard deviation of the sampling distribution (standard error) is:

\[ \sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}} \]

  • As sample size increases, the standard error decreases, making estimates more precise.

  • If sampling from a finite population (where ), use the finite population correction:

\[ \sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}} \]

  • Application Example: Testing claims about means (e.g., average drive time) using the CLT and calculating probabilities with z-scores.

7.5 The Sampling Distribution of the Proportion

When dealing with proportions (e.g., percentage of successes), the sampling distribution describes the pattern of sample proportions from repeated samples.

  • Underlying distribution is binomial.

  • Conditions: and (where ).

  • Sample Proportion Formula:

\[ \hat{p} = \frac{x}{n} \]

  • Standard Error of the Proportion:

\[ \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \]

  • Z-score for the Sample Proportion:

\[ z = \frac{\hat{p} - p}{\sigma_{\hat{p}}} \]

  • Example: Testing a college's claim that 70% of graduates have jobs related to their majors using a sample of 120 students.

Summary Table: Types of Sampling Methods

Sampling Method

Description

Key Feature

Simple Random

Every member has equal chance

Random selection

Systematic

Every kth member selected

Uses interval k

Stratified

Population divided into strata, sample from each

Homogeneity within strata

Cluster

Randomly select clusters, sample within

Clusters are mini-populations

Convenience

Sample easily accessible members

Nonprobability

Key Takeaways:

  • Sampling allows for efficient and practical data collection.

  • Probability sampling methods support valid statistical inference.

  • Biases and errors must be minimized for reliable results.

  • The Central Limit Theorem justifies the use of normal probability models for sample means and proportions.

Pearson Logo

Study Prep