Collecting Data: Sampling Methods, Experimental Design, and Sources of Bias

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Collecting Data

Sampling Methods

Sampling methods are essential for selecting representative subsets from a population, which allows researchers to make valid inferences about the whole group. The choice of sampling method impacts the accuracy and credibility of statistical conclusions.

Representative sample: A sample that accurately reflects key characteristics of the population.
Probability sampling: Sampling technique where each member of the population has a known chance of inclusion.

Probability vs. Non-Probability Sampling

Probability sampling: Uses random mechanisms so every member of the population has a known, nonzero chance of selection. Examples include simple random sampling, stratified sampling, and cluster sampling.
Non-probability sampling: Relies on researcher choice or convenience, not randomization. Examples include convenience sampling and voluntary response sampling.

Simple Random Sampling (SRS)

Every unit in the population has an equal chance of being selected. SRS can be implemented using random number generators or drawing lots.

Example: Drawing 100 names from a hat containing all students in a school.

Systematic Sampling

Selecting every k-th unit from an ordered list after a random start. Useful for large populations with ordered lists.

Example: Choosing every 10th customer entering a store after a random starting point.

Stratified Sampling

Dividing the population into subgroups (strata) based on important characteristics, then randomly sampling within each stratum. Ensures representation of all key subgroups.

Example: Sampling equal numbers of students from each grade level in a school.

Cluster Sampling

Dividing the population into groups (clusters), then randomly selecting entire clusters for study. Useful when populations are naturally grouped.

Example: Randomly selecting several classrooms and surveying all students within those rooms.

Non-Probability Methods

Convenience sampling: Selecting the most accessible units, prone to sampling and selection bias.
Voluntary response sampling: Participants self-select into the sample, often leading to bias.
Purposive sampling: Selecting cases based on researcher judgment of what is most informative.
Snowball sampling: Participants recruit others from limited subgroups, useful for hidden populations.

Experimental Design

Experimental design refers to the structured process of assigning treatments to subjects to isolate the effects of variables. Good design increases the credibility and validity of results.

Principles of Good Experimental Design

Randomization: Assigning treatments by chance to avoid bias.
Control and placebo: Comparing treatment groups to control groups, sometimes using placebos to account for expectations.
Replication: Repeating the experiment on multiple subjects to estimate variability.
Blocking: Grouping similar units and randomizing within each group to control for nuisance factors.

Designing Unbiased Survey Questions

Use neutral language.
Be specific and clear.
Avoid double-barreled questions (asking about two things at once).
Balance response options.
Avoid ambiguous wording.

Observational Studies vs. Experiments

Observational studies involve recording outcomes without assigning exposures or treatments, while experiments involve deliberate intervention and random assignment.

Observational study: Researcher does not assign treatments, only observes outcomes.
Experiment: Researcher assigns treatments to study their effects.
Case-control study: Compares people with a condition (cases) to those without (controls).
Confounding: When a third factor influences both the exposure and the outcome, potentially creating a spurious association.

Sources of Bias

Bias refers to systematic errors that can distort study results. Understanding and minimizing bias is crucial for credible data collection.

How Bias Differs from Sampling Error

Sampling error: Natural variability in statistics from sample to sample, decreases with larger samples.
Bias: Systematic error due to design or measurement choices, does not diminish with larger samples.

Common Sources of Bias

Term	Definition
Coverage bias	Systematic error when part of the population is missing from the sampling frame.
Nonresponse bias	Bias introduced when individuals do not respond or differ meaningfully from those who do.
Voluntary response bias	Bias caused by allowing people to self-select into a survey; respondents with strong opinions dominate the sample.
Convenience sampling bias	Bias from sampling the most accessible units, not representative of the population.
Recall bias	Systematic error due to inaccurate memory of past events.
Interviewer bias	Systematic error introduced by the interviewer’s expectations or behavior.
Healthy user bias	Bias when people who participate in certain programs are healthier than the general population.
Attrition bias	Bias when participants drop out of a study, leaving a non-representative sample.
Survivorship bias	Focusing only on observed "survivors" and ignoring those that failed, leading to overly optimistic conclusions.

Mitigating Bias

Use probability sampling whenever possible.
Stratify or cluster samples to ensure representation.
Design surveys and experiments to minimize nonresponse and measurement bias.
Be aware of and adjust for confounding variables in analysis.

Key Formulas

Probability of selection in SRS: where is the sample size and is the population size.

Recap Table: Key Terms

Term	Definition
Random assignment	Assigning sampled units to treatment conditions by chance to create comparable groups.
Treatment group	Group receiving the experimental intervention and baseline comparison.
Control group	Group not receiving the treatment, used to mimic the experience of the intervention to control for expectations.
Replication	Repeating the same treatment on multiple experimental units to estimate variability.
Blocking	Grouping similar units and randomizing within each group to control a nuisance factor.
Between-subjects design	Each unit experiences only one condition; comparisons are across subjects.
Within-subjects design	Each unit experiences all conditions in random order.
Leading question	A survey question that suggests a particular answer.
Loaded question	A survey question containing an assumption or implication.
Double-barreled question	A single question that asks about two things.
Ambiguous wording	Vague terms that can be interpreted differently by different respondents.

Example Applications

Designing a survey to estimate average GPA: Use SRS or stratified sampling for representativeness.
Testing a new medication: Use random assignment, control group, and replication for valid inference.
Studying the effect of diet on heart disease: Use an experiment to control confounding, or a cohort study for observational data.

Additional info: These notes expand on the provided material with definitions, examples, and academic context to ensure completeness and clarity for exam preparation.