Data Collection and Experimental Design in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

1.3 Data Collection and Experimental Design

Design of a Statistical Study

Designing a statistical study is fundamental to ensuring that the results are valid and reliable. The process involves several key steps, from identifying the focus of the study to interpreting the results and recognizing possible errors.

Identify Variables and Population: Clearly define the variable(s) of interest and the population to be studied.
Develop a Data Collection Plan: Ensure the sample is representative if sampling is used.
Collect Data: Gather data according to the plan.
Describe Data: Use descriptive statistics to summarize the data.
Interpret Data: Apply inferential statistics to make decisions about the population.
Identify Errors: Recognize and account for possible errors in the study.

Statistical studies are generally categorized as either observational studies or experiments:

Observational Study: The researcher observes and measures characteristics without influencing the subjects or conditions.
Experiment: The researcher applies a treatment to part of the population and observes the effects, often using a control group for comparison.

Example: Giving a vitamin supplement to one group and a placebo to another is an experiment. Surveying opinions without intervention is an observational study.

Data Collection Methods

Choosing an appropriate data collection method is crucial for the validity of a study. Two common methods are simulations and surveys.

Simulation: Uses mathematical or physical models (often with computers) to replicate real-world processes. Useful for situations that are impractical or dangerous to study directly (e.g., crash tests with dummies).
Survey: Involves asking questions to investigate characteristics of a population. Surveys can be conducted via interviews, phone, mail, or the Internet. Question wording must avoid bias.

Example: Surveying physicians about career motivations or simulating car crashes to study safety.

Experimental Design

Well-designed experiments are essential for producing unbiased and valid results. Key elements include control, randomization, and replication.

Control: Managing confounding variables that could affect the outcome. A confounding variable is one whose effects cannot be separated from those of the treatment.
Placebo Effect: Subjects may respond to a fake treatment (placebo). Blinding (subjects do not know their group) and double-blinding (neither subjects nor experimenters know group assignments) help control this effect.
Randomization: Assigning subjects to groups randomly to ensure groups are similar and results are unbiased.
Replication: Repeating the experiment with a large enough sample to validate results.

Types of Experimental Designs:

Completely Randomized Design: Subjects are assigned to groups entirely at random.
Randomized Block Design: Subjects are divided into blocks (groups with similar characteristics), then randomly assigned to treatments within each block.
Matched-Pairs Design: Subjects are paired based on similarity; one in each pair receives the treatment, the other receives a different treatment or control.

Example: Testing a new gum for quitting smoking should use random assignment and sufficient sample size to avoid bias and ensure validity.

Additional info: The Hawthorne effect refers to subjects changing their behavior simply because they know they are being studied.

Sampling Techniques

Sampling is used when it is impractical to study an entire population. The goal is to select a sample that is representative of the population to ensure valid inferences.

Census: Measures the entire population (rarely practical).
Sample: Measures a subset of the population.
Sampling Error: The difference between sample results and the true population value.
Biased Sample: Not representative of the population (e.g., only surveying college students for a national opinion).

Types of Sampling Methods

Sampling Method	Description	Example	Potential Bias
Simple Random Sample	Every member and every possible sample of the same size has an equal chance of being selected.	Assign numbers to all students and use a random number generator to select participants.	Low, if properly conducted
Stratified Sample	Population divided into strata (groups) by characteristic; random samples taken from each stratum.	Divide households by income level, randomly sample from each group.	Low, if strata are correctly defined and sampled proportionally
Cluster Sample	Population divided into clusters; all members of selected clusters are included.	Select all households in randomly chosen zip codes.	Can be high if clusters are not similar
Systematic Sample	Members ordered; select every k-th member after a random start.	Survey every 100th household after a random start.	Can be high if there is a pattern in the population
Convenience Sample	Sample members are easy to access.	Surveying students in your own class.	High; often not representative

Example: To survey opinions on stem cell research:

Dividing by major and sampling from each: Stratified sample
Assigning numbers and randomly selecting: Simple random sample
Surveying your own class: Convenience sample (likely biased)

Sampling with and without Replacement

With Replacement: The same member can be selected more than once.
Without Replacement: Each member can be selected only once.

Random Number Generation

Random numbers can be generated using tables, calculators, or computer software (e.g., Minitab, Excel, TI-84 Plus). For example, to select a simple random sample of 8 students from 731, assign numbers 1–731 and use a random number generator to select 8 unique numbers.

Summary Table: Sampling Techniques Comparison

Technique	How Members Are Chosen	When to Use	Potential Issues
Simple Random	Random selection from entire population	When every member should have equal chance	May be difficult with large populations
Stratified	Random selection from each subgroup (stratum)	When subgroups must be represented	Incorrect strata or proportions can bias results
Cluster	All members from randomly selected clusters	When population is naturally divided into groups	Clusters may not be homogeneous
Systematic	Every k-th member after random start	When population is ordered	Hidden patterns can bias results
Convenience	Whoever is easiest to sample	Quick, preliminary studies	Almost always biased

Key Terms and Definitions

Population: The entire group being studied.
Sample: A subset of the population.
Variable: A characteristic or attribute that can assume different values.
Experimental Unit: The subject or object being experimented on.
Treatment Group: The group receiving the treatment in an experiment.
Control Group: The group not receiving the treatment, used for comparison.
Placebo: A fake treatment used to control for psychological effects.
Confounding Variable: An outside influence that affects the results of an experiment.
Replication: Repeating an experiment to confirm results.
Randomization: Assigning subjects to groups by chance.
Blinding: Keeping subjects unaware of their group assignment.
Double-Blind: Both subjects and experimenters are unaware of group assignments.

Formulas and Notation

Sample Size (n): The number of subjects in the sample.
Population Size (N): The total number of subjects in the population.
Sampling Error: Where is the sample mean and is the population mean.

Additional info: Proper sampling and experimental design are foundational for all subsequent statistical analysis, including descriptive and inferential statistics.