Skip to main content
Back

Observational Studies and Causality in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Observational Studies

Definition and Characteristics

Observational studies are a fundamental type of research design in statistics where researchers do not assign treatments or actively control variables. Instead, they passively observe participants in their natural environments, recording data as it naturally occurs.

  • Definition: An observational study is one in which the researcher does not actively control the value of any variable, but simply observes the values as they naturally exist.

  • Purpose: Useful for discovering relationships or associations between variables.

  • Limitation: Cannot establish cause-and-effect relationships due to the presence of potential confounding and lurking variables.

  • Challenge: Difficult to control for all possible external influences (lurking variables).

Students playing musical instruments in a classroom setting

Example: Music and Academic Performance

Consider a study comparing the GPAs of music students and non-music students at a high school:

  • Music Students: Average GPA = 3.59; 16% earned all A's

  • Non-Music Students: Average GPA = 2.91; 5% earned all A's

Key Question: Should we conclude that music education causes better grades?

  • There may be other factors (confounding variables) such as work habits, parental support, socioeconomic status, or innate ability that influence both music participation and academic performance.

Types of Observational Studies

Retrospective Studies

Retrospective studies collect data on events that have already occurred. They are commonly used in public health and marketing, especially for studying rare outcomes.

  • Advantages: Useful for studying rare diseases or outcomes; can utilize existing records.

  • Disadvantages: May suffer from unreliable memories or incomplete historical records.

  • Example: Asking people with and without lung cancer about their past smoking habits (case-control study).

Prospective Studies

In prospective studies, subjects are identified in advance and data is collected as events unfold in the future.

  • Advantages: Possible to isolate variables and design the study to specific requirements.

  • Disadvantages: Can be expensive and time-consuming; rare outcomes require large samples.

  • Example: Selecting groups based on exercise habits and recording their longevity over time.

Cross-Sectional Studies

Cross-sectional studies involve selecting subjects without regard to their explanatory or response variables and collecting data about their current or past behaviors.

  • Advantages: Easiest type of study for gathering large samples.

  • Disadvantages: Can be unreliable if the outcome of interest is rare in the population.

  • Example: Surveying people about their phone use before sleep and their typical sleep duration.

Limitations of Observational Studies

Establishing Causality

Observational studies—whether retrospective, prospective, or cross-sectional—cannot demonstrate a causal relationship. This is due to the inability to control for all confounding and lurking variables.

Confounding Variables

Confounding occurs when two variables are associated in such a way that their individual effects on a response variable cannot be distinguished from each other.

  • Example: Fertilizing half of tomato plants and observing better growth, but the non-fertilized plants were in the shade. Both sunlight and fertilizer could influence growth, making it impossible to separate their effects.

Diagram illustrating confounding: x and z both affect y, making it unclear if x causes y

Lurking Variables and Common Response

A lurking variable is an unobserved variable that influences both the explanatory and response variables, creating a spurious association.

  • Example: A strong positive association between the number of firefighters at a fire and the amount of damage caused. The size of the fire (lurking variable) increases both the number of firefighters and the damage.

Diagram illustrating common response: z affects both x and y, creating a spurious association

Designing Observational Studies

Choosing the Study Type

When designing an observational study, the choice between retrospective and prospective approaches depends on the timing and rarity of the event:

  • Retrospective Study: Appropriate when the event has already occurred, especially if it is rare (e.g., investigating causes of pet deaths from kidney failure).

  • Prospective Study: Useful for verifying possible causes identified in retrospective studies, provided it is ethical and feasible.

Summary Table: Types of Observational Studies

Type

When Data is Collected

Advantages

Disadvantages

Example

Retrospective

Past

Good for rare outcomes; uses existing data

Unreliable memories; incomplete records

Case-control study of smoking and lung cancer

Prospective

Future

Can design study; isolate variables

Expensive; time-consuming

Tracking exercise habits and longevity

Cross-Sectional

Present

Easy to gather large samples

Unreliable for rare outcomes

Survey on phone use and sleep

Additional info: In all observational studies, careful consideration must be given to potential confounding and lurking variables, as these can obscure true relationships and prevent causal inference.

Pearson Logo

Study Prep