BackIntroduction to Statistics: Data Collection and Experimental Design
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Introduction to Statistics
Statistical and Critical Thinking
Statistics is the science of collecting, analyzing, interpreting, and presenting data. Critical thinking in statistics involves questioning the validity of data sources, methods, and conclusions.
Statistics helps us make informed decisions based on data.
Critical thinking is essential to avoid misinterpretation and misuse of statistical results.
Types of Data
Qualitative vs. Quantitative Data
Data can be classified into two main types: qualitative (categorical) and quantitative (numerical).
Qualitative data describes categories or qualities (e.g., gender, color).
Quantitative data represents numerical values (e.g., height, weight).
Collecting Sample Data
Importance of Proper Sampling
Using appropriate methods to collect sample data is crucial for valid statistical analysis. The simple random sample is particularly important.
Simple random sample: Every possible sample of size has an equal chance of being selected.
If sample data are not collected properly, results may be invalid regardless of analysis.
The Gold Standard in Experiments
Random assignment to placebo/treatment groups is considered the "gold standard" for experimental design.
Placebo: A harmless, inactive substance or procedure used for psychological benefit or as a control in experiments.
Placebos help researchers compare the effects of treatments objectively.
Basics of Collecting Data
Observational Studies vs. Experiments
Data can be obtained through observational studies or experiments.
Observational study: Researchers observe and measure characteristics without modifying subjects.
Experiment: Researchers apply a treatment and observe its effects on subjects (called experimental units or subjects).
Example: Ice Cream and Drownings
This example illustrates the difference between observational studies and experiments.
Observational study: Past data may show a correlation between ice cream sales and drownings, but this is due to a lurking variable (temperature).
Experiment: Assigning groups to eat ice cream or not shows no causal effect on drownings, demonstrating the superiority of experimental design for establishing causality.
Design of Experiments
Replication
Replication involves repeating an experiment on multiple subjects to ensure reliability.
Large sample sizes help detect treatment effects and reduce random error.
Blinding and Double-Blind Designs
Blinding prevents subjects from knowing whether they receive treatment or placebo, reducing bias.
Single-blind: Subjects do not know their group assignment.
Double-blind: Both subjects and experimenters are unaware of group assignments.
Randomization
Random assignment of subjects to groups ensures comparability and reduces bias.
Chance is used to create similar groups for valid comparisons.
Sampling Methods
Simple Random Sampling
Each possible sample of size has an equal probability of selection.
Ensures unbiased representation of the population.
Systematic Sampling
Select every th element from a starting point in the population.
Example: Select every 3rd or 6th person from a list.
Convenience Sampling
Use data that are easy to obtain, but this method may introduce bias.
Example: Surveying people nearby rather than a random sample.
Stratified Sampling
Divide the population into subgroups (strata) with shared characteristics, then sample from each stratum.
Example: Sampling men and women separately to ensure representation.
Cluster Sampling
Divide the population into clusters, randomly select clusters, and use all members from selected clusters.
Example: Randomly select schools and survey all students in those schools.
Multistage Sampling
Combine several sampling methods in stages.
Example: Pollsters may use stratified sampling at one stage and cluster sampling at another.
Types of Observational Studies
Cross-Sectional Study
Data are collected at a single point in time.
Retrospective (Case-Control) Study
Data are collected from past records or interviews.
Prospective (Cohort) Study
Data are collected in the future from groups sharing common factors (cohorts).
Confounding
Definition and Prevention
Confounding occurs when it is unclear which factor caused an observed effect. Proper experimental design aims to prevent confounding.
Controlling Effects of Variables
Completely Randomized Experimental Design
Subjects are assigned to treatment groups by random selection.
Randomized Block Design
Subjects are grouped into blocks with similar characteristics, then treatments are randomly assigned within each block.
Matched Pairs Design
Subjects are paired based on similarities, and each pair receives different treatments for comparison.
Rigorously Controlled Design
Subjects are carefully assigned to treatment groups to ensure similarity in important factors, though this is difficult to implement perfectly.
Sampling Errors
Types of Errors
Sampling error (random sampling error): Discrepancy between sample result and true population result due to chance fluctuations.
Nonsampling error: Human errors such as incorrect data entry, biased questions, or inappropriate statistical methods.
Nonrandom sampling error: Errors from using nonrandom sampling methods, such as convenience or voluntary response samples.
Summary Table: Sampling Methods
Sampling Method | Description | Example |
|---|---|---|
Simple Random | Every sample of size has equal chance | Randomly select 50 students from a list |
Systematic | Select every th element | Every 5th person on a roster |
Convenience | Easy-to-get data | Survey people in a cafeteria |
Stratified | Divide into strata, sample each | Sample men and women separately |
Cluster | Divide into clusters, sample all in selected clusters | Survey all students in selected schools |
Multistage | Combine methods in stages | Stratify by region, then cluster by school |
Key Formulas
Probability of Simple Random Sample
The probability of selecting a particular sample of size from a population of size :
Sampling Error
Sampling error is the difference between the sample statistic and the population parameter:
Conclusion
Proper data collection and experimental design are foundational to valid statistical analysis. Understanding sampling methods, study types, and error sources is essential for interpreting and conducting statistical research.