Introduction to Statistics: Data Collection and Experimental Design

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

Statistical and Critical Thinking

Statistics is the science of collecting, analyzing, interpreting, and presenting data. Critical thinking in statistics involves questioning the validity of data sources, methods, and conclusions.

Statistics helps us make informed decisions based on data.
Critical thinking is essential to avoid misinterpretation and misuse of statistical results.

Types of Data

Qualitative vs. Quantitative Data

Data can be classified into two main types: qualitative (categorical) and quantitative (numerical).

Qualitative data describes categories or qualities (e.g., gender, color).
Quantitative data represents numerical values (e.g., height, weight).

Collecting Sample Data

Importance of Proper Sampling

Using appropriate methods to collect sample data is crucial for valid statistical analysis. The simple random sample is particularly important.

Simple random sample: Every possible sample of size has an equal chance of being selected.
If sample data are not collected properly, results may be invalid regardless of analysis.

The Gold Standard in Experiments

Random assignment to placebo/treatment groups is considered the "gold standard" for experimental design.

Placebo: A harmless, inactive substance or procedure used for psychological benefit or as a control in experiments.
Placebos help researchers compare the effects of treatments objectively.

Basics of Collecting Data

Observational Studies vs. Experiments

Data can be obtained through observational studies or experiments.

Observational study: Researchers observe and measure characteristics without modifying subjects.
Experiment: Researchers apply a treatment and observe its effects on subjects (called experimental units or subjects).

Example: Ice Cream and Drownings

This example illustrates the difference between observational studies and experiments.

Observational study: Past data may show a correlation between ice cream sales and drownings, but this is due to a lurking variable (temperature).
Experiment: Assigning groups to eat ice cream or not shows no causal effect on drownings, demonstrating the superiority of experimental design for establishing causality.

Design of Experiments

Replication

Replication involves repeating an experiment on multiple subjects to ensure reliability.

Large sample sizes help detect treatment effects and reduce random error.

Blinding and Double-Blind Designs

Blinding prevents subjects from knowing whether they receive treatment or placebo, reducing bias.

Single-blind: Subjects do not know their group assignment.
Double-blind: Both subjects and experimenters are unaware of group assignments.

Randomization

Random assignment of subjects to groups ensures comparability and reduces bias.

Chance is used to create similar groups for valid comparisons.

Sampling Methods

Simple Random Sampling

Each possible sample of size has an equal probability of selection.

Ensures unbiased representation of the population.

Systematic Sampling

Select every th element from a starting point in the population.

Example: Select every 3rd or 6th person from a list.

Convenience Sampling

Use data that are easy to obtain, but this method may introduce bias.

Example: Surveying people nearby rather than a random sample.

Stratified Sampling

Divide the population into subgroups (strata) with shared characteristics, then sample from each stratum.

Example: Sampling men and women separately to ensure representation.

Cluster Sampling

Divide the population into clusters, randomly select clusters, and use all members from selected clusters.

Example: Randomly select schools and survey all students in those schools.

Multistage Sampling

Combine several sampling methods in stages.

Example: Pollsters may use stratified sampling at one stage and cluster sampling at another.

Types of Observational Studies

Cross-Sectional Study

Data are collected at a single point in time.

Retrospective (Case-Control) Study

Data are collected from past records or interviews.

Prospective (Cohort) Study

Data are collected in the future from groups sharing common factors (cohorts).

Confounding

Definition and Prevention

Confounding occurs when it is unclear which factor caused an observed effect. Proper experimental design aims to prevent confounding.

Controlling Effects of Variables

Completely Randomized Experimental Design

Subjects are assigned to treatment groups by random selection.

Randomized Block Design

Subjects are grouped into blocks with similar characteristics, then treatments are randomly assigned within each block.

Matched Pairs Design

Subjects are paired based on similarities, and each pair receives different treatments for comparison.

Rigorously Controlled Design

Subjects are carefully assigned to treatment groups to ensure similarity in important factors, though this is difficult to implement perfectly.

Sampling Errors

Types of Errors

Sampling error (random sampling error): Discrepancy between sample result and true population result due to chance fluctuations.
Nonsampling error: Human errors such as incorrect data entry, biased questions, or inappropriate statistical methods.
Nonrandom sampling error: Errors from using nonrandom sampling methods, such as convenience or voluntary response samples.

Summary Table: Sampling Methods

Sampling Method	Description	Example
Simple Random	Every sample of size has equal chance	Randomly select 50 students from a list
Systematic	Select every th element	Every 5th person on a roster
Convenience	Easy-to-get data	Survey people in a cafeteria
Stratified	Divide into strata, sample each	Sample men and women separately
Cluster	Divide into clusters, sample all in selected clusters	Survey all students in selected schools
Multistage	Combine methods in stages	Stratify by region, then cluster by school

Key Formulas

Probability of Simple Random Sample

The probability of selecting a particular sample of size from a population of size :

Sampling Error

Sampling error is the difference between the sample statistic and the population parameter:

Conclusion

Proper data collection and experimental design are foundational to valid statistical analysis. Understanding sampling methods, study types, and error sources is essential for interpreting and conducting statistical research.