Experiments and Observational Studies: Principles of Experimental Design

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Experiments and Observational Studies

Introduction

Experiments and observational studies are two fundamental approaches in statistics for investigating relationships between variables. Experiments involve the deliberate manipulation of variables to observe their effect on a response, while observational studies involve observing subjects without intervention. Understanding the principles of experimental design is crucial for drawing valid conclusions from data.

The Four Principles of Experimental Design

1. Control

Control refers to the process of managing variables in an experiment to isolate the effect of the factor(s) under study. By controlling extraneous variables, researchers can reduce the impact of confounding factors and lurking variables, ensuring that observed effects are due to the treatments applied.

Control of Factors: Decide on the levels of factors under study and assign subjects randomly to these levels.
Control of Other Variables: Make all conditions as similar as possible for all treatment groups to prevent other sources of variation from affecting the response.
Purpose: Isolate the variable of interest and avoid confounding.

Example: In a pet food safety experiment, control is achieved by standardizing food portions, housing, exercise, and using one breed of dog.

2. Randomization

Randomization is the process of randomly assigning subjects to treatment groups. This technique equalizes the effects of variables that are unknown or cannot be controlled, distributing them equally across treatment levels and reducing bias.

Purpose: Makes comparisons among treatments fair and reduces the influence of lurking variables.
Implementation: Assign subjects to groups using random methods (e.g., random number generators).

Example: In the pet food experiment, dogs are randomly assigned to different feed treatments.

Random assignment to groups and therapies

3. Replication

Replication involves applying each treatment to multiple subjects and, if possible, repeating the entire experiment on different populations. Replication allows for the estimation of variability in responses and increases the reliability of results.

Within Experiment: Use many subjects per treatment group.
Across Experiments: Repeat the experiment with different populations or under different conditions.

Example: Assigning more than one dog to each treatment and repeating the experiment with a different breed.

4. Blocking

Blocking is the grouping of similar individuals together and randomizing within these blocks. A blocking variable is an explanatory variable not randomly assigned but may affect the response variable. Blocking helps account for identifiable variability due to differences between blocks.

Purpose: Reduce unwanted variability by accounting for known differences among experimental units.
Implementation: Group subjects by a characteristic (e.g., smoking status) and randomize treatments within each group.

Randomized block design by gender

Example: In a study, subjects may be blocked by gender before random assignment to treatments.

Completely Randomized Comparative Experiment Design

A completely randomized design assigns all experimental units to treatments purely by chance. This design is ideal when there are no obvious sources of variability among subjects that need to be controlled by blocking.

Structure: Subjects are randomly assigned to treatment groups, each group receives a different treatment, and outcomes are compared.

Random allocation to groups and treatments

Example: Nail Polish Experiment

To test whether two nail polish colors resist chipping differently in chlorine water, a completely randomized experiment can be designed as follows:

Plan: Investigate if chlorine affects two nail polish colors differently.
Response Variable: Percent of polish chipped away.
Factor: Nail polish color (Red vs. Nude).
Treatments: Red and Nude polish.
Experimental Units: 30 acrylic nails glued to chopsticks to control for nail variability.
Control: Same brand of nails and polish, same painting and soaking time.
Replication: 15 nails per treatment.
Randomization: Randomly assign nails to treatments using technology.

Random allocation to nail polish color groups

After treatment, the response is measured by scanning each nail and comparing the fraction of polish chipped before and after the experiment.

Statistical Significance

Statistical significance refers to the likelihood that an observed difference between groups is too large to have occurred by chance alone. If the difference is large enough in terms of standard deviations, it is considered statistically significant.

Interpretation: A statistically significant result suggests a true effect rather than random variation.
Assessment: Use side-by-side boxplots and compare means to evaluate significance.

Side-by-side boxplots for comparing treatments

Example: If the medians of two groups' boxplots do not overlap, the difference is likely statistically significant.

Randomization in Data Collection: Experiments vs. Surveys

Surveys

Random Selection: Surveys use random selection from a population to obtain representative samples.
Purpose: Ensure the sample reflects the diversity and variability of the population, eliminating sampling bias.
Generalization: Results can be generalized to the population if the sample is random.

Experiments

Random Assignment: Experiments use random assignment to create similar groups at the start of the experiment.
Purpose: Assess the effects of treatments and attribute differences in response to the treatments rather than lurking variables.
Causality: If groups are similar at the start and no lurking variables are present, causation can be inferred.
Generalization: Results cannot be generalized to the population unless the experiment is repeated under different circumstances.

Summary Table: Randomization in Data Collection

Was the sample randomly selected?	Was the explanatory variable randomly assigned?	Possible to generalize to the population?	Possible to make conclusions about causality?
Yes	Yes	Yes	Yes
Yes	No	Yes	No
No	Yes	No	Yes
No	No	No	No

Key Formulas and Concepts

Statistical Significance (z-score):

Random Assignment: Ensures treatment groups are comparable.
Replication: Reduces variability and increases reliability.
Blocking: Accounts for known sources of variability.

Additional info: These principles are foundational for designing valid experiments and interpreting results in statistics. Proper application allows for causal inference and generalization when appropriate.