Chapter 1: Introduction to Statistics – Data Collection and Experimental Design

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Section 1.3: Data Collection and Experimental Design

Overview

This section introduces the foundational concepts of designing statistical studies, distinguishing between observational studies and experiments, and understanding various data collection and sampling techniques. Mastery of these concepts is essential for conducting valid and reliable statistical analyses.

Designing a Statistical Study

Identify Variables and Population: Clearly define the variable(s) of interest and the population to be studied.
Develop a Data Collection Plan: Ensure the sample is representative of the population if sampling is used.
Collect Data: Gather data according to the plan.
Describe Data: Use descriptive statistics to summarize the data.
Interpret Data: Apply inferential statistics to make decisions about the population.
Identify Errors: Recognize and address possible errors in the study.

Types of Statistical Studies

Observational Study: The researcher observes and measures characteristics without influencing the subjects. Example: Measuring time spent on activities by individuals.
Experiment: The researcher applies a treatment to part of the population (treatment group) and observes responses, often comparing to a control group (which may receive a placebo). Example: Testing the effect of sucralose on glycemic response.

Examples

Experiment: Patients receive vitamin supplementation or placebo to test effects on health outcomes.
Observational Study: Surveying adults about their confidence in the economy without influencing their responses.

Data Collection Methods

Simulation: Uses mathematical or physical models (often computer-based) to replicate real-world processes. Useful for impractical or dangerous scenarios (e.g., crash tests with dummies).
Survey: Collects data by asking questions to a sample of the population. Surveys can be conducted via interviews, phone, mail, or online. Question wording must avoid bias.

Experimental Design

Three key elements of a well-designed experiment are control, randomization, and replication.

Confounding Variables: Occur when the effects of multiple factors cannot be distinguished. Example: Increased business after remodeling and a new mall opening simultaneously.
Placebo Effect: Subjects respond to a fake treatment. Controlled by blinding (subjects do not know their group) or double-blind design (neither subjects nor experimenters know group assignments).
Randomization: Assigns subjects to groups by chance. Completely randomized design assigns all subjects randomly; randomized block design divides subjects into blocks by characteristics, then randomly assigns within blocks.
Matched-Pairs Design: Pairs subjects by similarity; one receives treatment, the other receives control.
Sample Size: Larger samples increase reliability and validity of results.
Replication: Repeating the experiment with many subjects to confirm findings.

Examples of Experimental Design Issues

Small Sample Size: Results may not be valid; increase sample size and replicate.
Non-random Assignment: Groups must be similar; use randomization within blocks to avoid bias.

Sampling Techniques

Census: Measures the entire population.
Sample: Measures part of the population; more practical but subject to sampling error (difference between sample and population results).
Random Sample: Every member has an equal chance of selection.
Simple Random Sample: Every possible sample of the same size has an equal chance of selection.

Example: Simple Random Sample

Assign numbers to all population members.
Use a random number table or generator to select sample members.

Other Sampling Techniques

Stratified Sample: Divide population into groups (strata) and randomly sample from each group. Example: Sampling students by major.
Cluster Sample: Divide population into clusters, then select all members from one or more clusters. Example: Sampling households by zip code.
Systematic Sample: Select every kth member after a random start. Example: Every 100th household.
Convenience Sample: Select members who are easiest to reach; often leads to bias. Example: Sampling only students in your class.

Table: Comparison of Sampling Techniques

Technique	Description	Example	Potential Bias
Simple Random	Every member and sample equally likely	Randomly select students by ID	Low
Stratified	Divide into strata, sample from each	Sample students by major	Low
Cluster	Divide into clusters, sample all in some clusters	Sample all households in selected zip codes	Moderate
Systematic	Select every kth member	Every 100th household	Moderate
Convenience	Sample easiest to reach	Sample students in your class	High

Key Formulas and Concepts

Sampling Error:

Summary

Proper design and sampling are crucial for valid statistical inference.
Understanding the differences between study types, data collection methods, and sampling techniques helps avoid bias and errors.

Elementary Statistics textbook cover