BackObservational Studies and Causality in Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Observational Studies
Definition and Characteristics
Observational studies are a fundamental type of research design in statistics where researchers do not assign treatments or actively control variables. Instead, they passively observe participants in their natural environments, recording data as it naturally occurs.
Definition: An observational study is one in which the researcher does not actively control the value of any variable, but simply observes the values as they naturally exist.
Purpose: Useful for discovering relationships or associations between variables.
Limitation: Cannot establish cause-and-effect relationships due to the presence of potential confounding and lurking variables.
Challenge: Difficult to control for all possible external influences (lurking variables).

Example: Music and Academic Performance
Consider a study comparing the GPAs of music students and non-music students at a high school:
Music Students: Average GPA = 3.59; 16% earned all A's
Non-Music Students: Average GPA = 2.91; 5% earned all A's
Key Question: Should we conclude that music education causes better grades?
There may be other factors (confounding variables) such as work habits, parental support, socioeconomic status, or innate ability that influence both music participation and academic performance.
Types of Observational Studies
Retrospective Studies
Retrospective studies collect data on events that have already occurred. They are commonly used in public health and marketing, especially for studying rare outcomes.
Advantages: Useful for studying rare diseases or outcomes; can utilize existing records.
Disadvantages: May suffer from unreliable memories or incomplete historical records.
Example: Asking people with and without lung cancer about their past smoking habits (case-control study).
Prospective Studies
In prospective studies, subjects are identified in advance and data is collected as events unfold in the future.
Advantages: Possible to isolate variables and design the study to specific requirements.
Disadvantages: Can be expensive and time-consuming; rare outcomes require large samples.
Example: Selecting groups based on exercise habits and recording their longevity over time.
Cross-Sectional Studies
Cross-sectional studies involve selecting subjects without regard to their explanatory or response variables and collecting data about their current or past behaviors.
Advantages: Easiest type of study for gathering large samples.
Disadvantages: Can be unreliable if the outcome of interest is rare in the population.
Example: Surveying people about their phone use before sleep and their typical sleep duration.
Limitations of Observational Studies
Establishing Causality
Observational studies—whether retrospective, prospective, or cross-sectional—cannot demonstrate a causal relationship. This is due to the inability to control for all confounding and lurking variables.
Confounding Variables
Confounding occurs when two variables are associated in such a way that their individual effects on a response variable cannot be distinguished from each other.
Example: Fertilizing half of tomato plants and observing better growth, but the non-fertilized plants were in the shade. Both sunlight and fertilizer could influence growth, making it impossible to separate their effects.

Lurking Variables and Common Response
A lurking variable is an unobserved variable that influences both the explanatory and response variables, creating a spurious association.
Example: A strong positive association between the number of firefighters at a fire and the amount of damage caused. The size of the fire (lurking variable) increases both the number of firefighters and the damage.

Designing Observational Studies
Choosing the Study Type
When designing an observational study, the choice between retrospective and prospective approaches depends on the timing and rarity of the event:
Retrospective Study: Appropriate when the event has already occurred, especially if it is rare (e.g., investigating causes of pet deaths from kidney failure).
Prospective Study: Useful for verifying possible causes identified in retrospective studies, provided it is ethical and feasible.
Summary Table: Types of Observational Studies
Type | When Data is Collected | Advantages | Disadvantages | Example |
|---|---|---|---|---|
Retrospective | Past | Good for rare outcomes; uses existing data | Unreliable memories; incomplete records | Case-control study of smoking and lung cancer |
Prospective | Future | Can design study; isolate variables | Expensive; time-consuming | Tracking exercise habits and longevity |
Cross-Sectional | Present | Easy to gather large samples | Unreliable for rare outcomes | Survey on phone use and sleep |
Additional info: In all observational studies, careful consideration must be given to potential confounding and lurking variables, as these can obscure true relationships and prevent causal inference.