Introduction to Statistics: Collecting Sample Data and Experimental Design

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

Statistical and Critical Thinking

Statistics is the science of collecting, analyzing, interpreting, and presenting data. Critical thinking in statistics involves questioning the validity of data sources, methods, and conclusions to avoid misleading results.

Statistics helps us make informed decisions based on data.
Critical thinking is essential to identify biases, errors, and confounding factors in statistical studies.

Types of Data

Data in statistics can be classified into different types, which determine the appropriate methods for analysis.

Qualitative (Categorical) Data: Describes qualities or categories (e.g., gender, color).
Quantitative (Numerical) Data: Represents counts or measurements (e.g., height, age).

Collecting Sample Data

Key Concept: Importance of Proper Sampling

When analyzing sample data, it is crucial to use an appropriate method for collecting those data. The simple random sample is of particular importance.

Simple Random Sample: Every possible sample of the same size has an equal chance of being chosen.
If sample data are not collected appropriately, the data may be unreliable and lead to invalid conclusions.

The Gold Standard in Experiments

Random assignment to placebo and treatment groups is considered the "gold standard" in experimental design because it minimizes bias and confounding.

Placebo: A harmless, inactive substance or procedure used for psychological benefit or as a control in experiments.
Example: A sugar pill with no medicinal effect.

Basics of Collecting Data

Sources of Data: Observational Studies vs. Experiments

Statistical methods rely on data collected from two main sources: observational studies and experiments.

Observational Study: Researchers observe and measure specific characteristics without attempting to modify the subjects.
Experiment: Researchers apply a treatment and observe its effects on the subjects (called experimental units or subjects when people).

Example: Ice Cream and Drownings

This example illustrates the difference between observational studies and experiments, and the importance of identifying lurking variables.

Observational Study: Observing past data may suggest that ice cream sales cause drownings, but this is a mistake due to a lurking variable (temperature). As temperature increases, both ice cream sales and drownings increase because more people swim.
Experiment: Assigning one group to eat ice cream and another not, then measuring drowning rates, shows no effect of ice cream on drownings. The experiment controls for confounding variables.

Sampling Methods

Simple Random Sampling

A sample of n subjects is selected so that every possible sample of the same size has the same chance of being chosen.

Random Sample: All members of the population have the same chance of being selected, but not all possible samples are equally likely.

Systematic Sampling

Select a starting point and then choose every kth element in the population (e.g., every 3rd or 6th person).

Convenience Sampling

Use data that are easy to obtain, which may lead to bias and unrepresentative samples.

Stratified Sampling

Divide the population into subgroups (strata) that share similar characteristics, then take a sample from each subgroup.

Example: Divide by gender, then sample men and women separately.

Cluster Sampling

Divide the population into clusters, randomly select some clusters, and use all members from those clusters.

Multistage Sampling

Combine several sampling methods, selecting samples in stages, each possibly using a different method.

Design of Experiments

Replication

Replication involves repeating an experiment on multiple individuals to ensure results are reliable and not due to chance.

Requires sufficiently large sample sizes to detect treatment effects.

Blinding and Double-Blind Designs

Blinding prevents subjects from knowing whether they receive treatment or placebo, reducing bias from the placebo effect.

Single-Blind: Only the subject is unaware of the treatment assignment.
Double-Blind: Both the subject and the experimenter are unaware of the treatment assignment.

Randomization

Subjects are assigned to groups by random selection to ensure groups are similar and reduce bias.

Types of Observational Studies

Cross-Sectional Study

Data are observed, measured, and collected at one point in time.

Retrospective (Case-Control) Study

Data are collected from a past time period by examining records or interviews.

Prospective (Longitudinal or Cohort) Study

Data are collected in the future from groups sharing common factors (cohorts).

Controlling Effects of Variables

Confounding

Confounding occurs when an observed effect cannot be attributed to a single factor due to the influence of other variables.

Experimental Designs

Completely Randomized Design: Subjects are assigned to treatment groups by random selection.
Randomized Block Design: Subjects are grouped into blocks based on similar characteristics, then treatments are randomly assigned within each block.
Matched Pairs Design: Subjects are paired based on similarities, and each pair receives different treatments for comparison.
Rigorously Controlled Design: Subjects are carefully assigned to groups to ensure similarity in important characteristics, though this is difficult to implement perfectly.

Sampling Errors

Sampling Error

Occurs when a random sample differs from the true population result due to chance fluctuations.

Nonsampling Error

Results from human errors such as incorrect data entry, biased questions, false responses, or inappropriate statistical methods.

Nonrandom Sampling Error

Occurs when a nonrandom sampling method is used, such as convenience or voluntary response samples, leading to bias.

Summary Table: Sampling Methods

Sampling Method	Description	Advantages	Disadvantages
Simple Random	Every sample of size n has equal chance	Unbiased, representative	May be difficult to implement
Systematic	Select every k-th element	Easy to administer	May introduce periodic bias
Convenience	Use easy-to-get data	Quick, inexpensive	Often biased, not representative
Stratified	Divide into strata, sample each	Ensures representation of subgroups	Requires knowledge of strata
Cluster	Divide into clusters, sample all in selected clusters	Efficient for large populations	May not represent entire population
Multistage	Combine methods in stages	Flexible, practical	Complex to design and analyze

Key Formulas

Probability of Simple Random Sample

The probability of selecting a particular sample of size n from a population of N:

Sampling Error

Sampling error is the difference between the sample statistic and the population parameter:

Example Application

Suppose a population has 1000 individuals, and a simple random sample of 50 is taken. Each possible sample of 50 has equal probability:

This ensures unbiased representation of the population.