BackIntroduction to Statistics: Collecting Sample Data and Experimental Design
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Introduction to Statistics
Statistical and Critical Thinking
Statistics is the science of collecting, analyzing, interpreting, and presenting data. Critical thinking in statistics involves questioning the validity of data sources, methods, and conclusions to avoid misleading results.
Statistics helps us make informed decisions based on data.
Critical thinking is essential to identify biases, errors, and confounding factors in statistical studies.
Types of Data
Data in statistics can be classified into different types, which determine the appropriate methods for analysis.
Qualitative (Categorical) Data: Describes qualities or categories (e.g., gender, color).
Quantitative (Numerical) Data: Represents counts or measurements (e.g., height, age).
Collecting Sample Data
Key Concept: Importance of Proper Sampling
When analyzing sample data, it is crucial to use an appropriate method for collecting those data. The simple random sample is of particular importance.
Simple Random Sample: Every possible sample of the same size has an equal chance of being chosen.
If sample data are not collected appropriately, the data may be unreliable and lead to invalid conclusions.
The Gold Standard in Experiments
Random assignment to placebo and treatment groups is considered the "gold standard" in experimental design because it minimizes bias and confounding.
Placebo: A harmless, inactive substance or procedure used for psychological benefit or as a control in experiments.
Example: A sugar pill with no medicinal effect.
Basics of Collecting Data
Sources of Data: Observational Studies vs. Experiments
Statistical methods rely on data collected from two main sources: observational studies and experiments.
Observational Study: Researchers observe and measure specific characteristics without attempting to modify the subjects.
Experiment: Researchers apply a treatment and observe its effects on the subjects (called experimental units or subjects when people).
Example: Ice Cream and Drownings
This example illustrates the difference between observational studies and experiments, and the importance of identifying lurking variables.
Observational Study: Observing past data may suggest that ice cream sales cause drownings, but this is a mistake due to a lurking variable (temperature). As temperature increases, both ice cream sales and drownings increase because more people swim.
Experiment: Assigning one group to eat ice cream and another not, then measuring drowning rates, shows no effect of ice cream on drownings. The experiment controls for confounding variables.
Sampling Methods
Simple Random Sampling
A sample of n subjects is selected so that every possible sample of the same size has the same chance of being chosen.
Random Sample: All members of the population have the same chance of being selected, but not all possible samples are equally likely.
Systematic Sampling
Select a starting point and then choose every kth element in the population (e.g., every 3rd or 6th person).
Convenience Sampling
Use data that are easy to obtain, which may lead to bias and unrepresentative samples.
Stratified Sampling
Divide the population into subgroups (strata) that share similar characteristics, then take a sample from each subgroup.
Example: Divide by gender, then sample men and women separately.
Cluster Sampling
Divide the population into clusters, randomly select some clusters, and use all members from those clusters.
Multistage Sampling
Combine several sampling methods, selecting samples in stages, each possibly using a different method.
Design of Experiments
Replication
Replication involves repeating an experiment on multiple individuals to ensure results are reliable and not due to chance.
Requires sufficiently large sample sizes to detect treatment effects.
Blinding and Double-Blind Designs
Blinding prevents subjects from knowing whether they receive treatment or placebo, reducing bias from the placebo effect.
Single-Blind: Only the subject is unaware of the treatment assignment.
Double-Blind: Both the subject and the experimenter are unaware of the treatment assignment.
Randomization
Subjects are assigned to groups by random selection to ensure groups are similar and reduce bias.
Types of Observational Studies
Cross-Sectional Study
Data are observed, measured, and collected at one point in time.
Retrospective (Case-Control) Study
Data are collected from a past time period by examining records or interviews.
Prospective (Longitudinal or Cohort) Study
Data are collected in the future from groups sharing common factors (cohorts).
Controlling Effects of Variables
Confounding
Confounding occurs when an observed effect cannot be attributed to a single factor due to the influence of other variables.
Experimental Designs
Completely Randomized Design: Subjects are assigned to treatment groups by random selection.
Randomized Block Design: Subjects are grouped into blocks based on similar characteristics, then treatments are randomly assigned within each block.
Matched Pairs Design: Subjects are paired based on similarities, and each pair receives different treatments for comparison.
Rigorously Controlled Design: Subjects are carefully assigned to groups to ensure similarity in important characteristics, though this is difficult to implement perfectly.
Sampling Errors
Sampling Error
Occurs when a random sample differs from the true population result due to chance fluctuations.
Nonsampling Error
Results from human errors such as incorrect data entry, biased questions, false responses, or inappropriate statistical methods.
Nonrandom Sampling Error
Occurs when a nonrandom sampling method is used, such as convenience or voluntary response samples, leading to bias.
Summary Table: Sampling Methods
Sampling Method | Description | Advantages | Disadvantages |
|---|---|---|---|
Simple Random | Every sample of size n has equal chance | Unbiased, representative | May be difficult to implement |
Systematic | Select every k-th element | Easy to administer | May introduce periodic bias |
Convenience | Use easy-to-get data | Quick, inexpensive | Often biased, not representative |
Stratified | Divide into strata, sample each | Ensures representation of subgroups | Requires knowledge of strata |
Cluster | Divide into clusters, sample all in selected clusters | Efficient for large populations | May not represent entire population |
Multistage | Combine methods in stages | Flexible, practical | Complex to design and analyze |
Key Formulas
Probability of Simple Random Sample
The probability of selecting a particular sample of size n from a population of N:
Sampling Error
Sampling error is the difference between the sample statistic and the population parameter:
Example Application
Suppose a population has 1000 individuals, and a simple random sample of 50 is taken. Each possible sample of 50 has equal probability:
This ensures unbiased representation of the population.