BackData Collection and Experimental Design in Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
1.3 Data Collection and Experimental Design
Design of a Statistical Study
Designing a statistical study is fundamental to ensuring that the results are valid and reliable. The process involves several key steps, from identifying the focus of the study to interpreting the results and recognizing possible errors.
Identify Variables and Population: Clearly define the variable(s) of interest and the population to be studied.
Develop a Data Collection Plan: Ensure the sample is representative if sampling is used.
Collect Data: Gather data according to the plan.
Describe Data: Use descriptive statistics to summarize the data.
Interpret Data: Apply inferential statistics to make decisions about the population.
Identify Errors: Recognize and account for possible errors in the study.
Statistical studies are generally categorized as either observational studies or experiments:
Observational Study: The researcher observes and measures characteristics without influencing the subjects or conditions.
Experiment: The researcher applies a treatment to part of the population and observes the effects, often using a control group for comparison.
Example: Giving a vitamin supplement to one group and a placebo to another is an experiment. Surveying opinions without intervention is an observational study.
Data Collection Methods
Choosing an appropriate data collection method is crucial for the validity of a study. Two common methods are simulations and surveys.
Simulation: Uses mathematical or physical models (often with computers) to replicate real-world processes. Useful for situations that are impractical or dangerous to study directly (e.g., crash tests with dummies).
Survey: Involves asking questions to investigate characteristics of a population. Surveys can be conducted via interviews, phone, mail, or the Internet. Question wording must avoid bias.
Example: Surveying physicians about career motivations or simulating car crashes to study safety.
Experimental Design
Well-designed experiments are essential for producing unbiased and valid results. Key elements include control, randomization, and replication.
Control: Managing confounding variables that could affect the outcome. A confounding variable is one whose effects cannot be separated from those of the treatment.
Placebo Effect: Subjects may respond to a fake treatment (placebo). Blinding (subjects do not know their group) and double-blinding (neither subjects nor experimenters know group assignments) help control this effect.
Randomization: Assigning subjects to groups randomly to ensure groups are similar and results are unbiased.
Replication: Repeating the experiment with a large enough sample to validate results.
Types of Experimental Designs:
Completely Randomized Design: Subjects are assigned to groups entirely at random.
Randomized Block Design: Subjects are divided into blocks (groups with similar characteristics), then randomly assigned to treatments within each block.
Matched-Pairs Design: Subjects are paired based on similarity; one in each pair receives the treatment, the other receives a different treatment or control.
Example: Testing a new gum for quitting smoking should use random assignment and sufficient sample size to avoid bias and ensure validity.
Additional info: The Hawthorne effect refers to subjects changing their behavior simply because they know they are being studied.
Sampling Techniques
Sampling is used when it is impractical to study an entire population. The goal is to select a sample that is representative of the population to ensure valid inferences.
Census: Measures the entire population (rarely practical).
Sample: Measures a subset of the population.
Sampling Error: The difference between sample results and the true population value.
Biased Sample: Not representative of the population (e.g., only surveying college students for a national opinion).
Types of Sampling Methods
Sampling Method | Description | Example | Potential Bias |
|---|---|---|---|
Simple Random Sample | Every member and every possible sample of the same size has an equal chance of being selected. | Assign numbers to all students and use a random number generator to select participants. | Low, if properly conducted |
Stratified Sample | Population divided into strata (groups) by characteristic; random samples taken from each stratum. | Divide households by income level, randomly sample from each group. | Low, if strata are correctly defined and sampled proportionally |
Cluster Sample | Population divided into clusters; all members of selected clusters are included. | Select all households in randomly chosen zip codes. | Can be high if clusters are not similar |
Systematic Sample | Members ordered; select every k-th member after a random start. | Survey every 100th household after a random start. | Can be high if there is a pattern in the population |
Convenience Sample | Sample members are easy to access. | Surveying students in your own class. | High; often not representative |
Example: To survey opinions on stem cell research:
Dividing by major and sampling from each: Stratified sample
Assigning numbers and randomly selecting: Simple random sample
Surveying your own class: Convenience sample (likely biased)
Sampling with and without Replacement
With Replacement: The same member can be selected more than once.
Without Replacement: Each member can be selected only once.
Random Number Generation
Random numbers can be generated using tables, calculators, or computer software (e.g., Minitab, Excel, TI-84 Plus). For example, to select a simple random sample of 8 students from 731, assign numbers 1–731 and use a random number generator to select 8 unique numbers.
Summary Table: Sampling Techniques Comparison
Technique | How Members Are Chosen | When to Use | Potential Issues |
|---|---|---|---|
Simple Random | Random selection from entire population | When every member should have equal chance | May be difficult with large populations |
Stratified | Random selection from each subgroup (stratum) | When subgroups must be represented | Incorrect strata or proportions can bias results |
Cluster | All members from randomly selected clusters | When population is naturally divided into groups | Clusters may not be homogeneous |
Systematic | Every k-th member after random start | When population is ordered | Hidden patterns can bias results |
Convenience | Whoever is easiest to sample | Quick, preliminary studies | Almost always biased |
Key Terms and Definitions
Population: The entire group being studied.
Sample: A subset of the population.
Variable: A characteristic or attribute that can assume different values.
Experimental Unit: The subject or object being experimented on.
Treatment Group: The group receiving the treatment in an experiment.
Control Group: The group not receiving the treatment, used for comparison.
Placebo: A fake treatment used to control for psychological effects.
Confounding Variable: An outside influence that affects the results of an experiment.
Replication: Repeating an experiment to confirm results.
Randomization: Assigning subjects to groups by chance.
Blinding: Keeping subjects unaware of their group assignment.
Double-Blind: Both subjects and experimenters are unaware of group assignments.
Formulas and Notation
Sample Size (n): The number of subjects in the sample.
Population Size (N): The total number of subjects in the population.
Sampling Error: Where is the sample mean and is the population mean.
Additional info: Proper sampling and experimental design are foundational for all subsequent statistical analysis, including descriptive and inferential statistics.