BackIntroduction to the Practice of Statistics: Data Collection and Experimental Design
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
1.1 Introduction to the Practice of Statistics
Definition and Scope of Statistics
Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions.
Statistics provides a measure of confidence in any conclusions drawn from data.
Key Terms
Data: Facts or figures from which conclusions can be drawn; information.
Population: The entire group that is being studied.
Individual: A person or object that is a member of the population.
Sample: A subset of the population.
Branches of Statistics
Descriptive Statistics: Organizing, summarizing, and displaying data using numerical summaries, tables, and graphs.
Inferential Statistics: Using methods that take results from a sample, extend them to the population, and measure the reliability of the result.
Parameters and Statistics
Parameter: A numerical summary of a population.
Statistic: A numerical summary of a sample.
Example:
Suppose 48.2% of all students on campus own a car (parameter). A survey of 100 students finds 46% own a car (statistic).
Variables
Variables: Characteristics of individuals within the population.
Qualitative (Categorical) Variables: Classify individuals based on attributes or characteristics (e.g., gender, area code).
Quantitative Variables: Provide numerical measures of individuals (e.g., temperature, number of study days).
Types of Quantitative Variables
Discrete Variable: Has a finite or countable number of possible values (e.g., number of heads in coin flips).
Continuous Variable: Has an infinite number of possible values, not countable, can take any value in an interval (e.g., distance traveled).
1.2 Observational Studies vs. Designed Experiments
Variables in Studies
Explanatory Variable (x): Thought to influence or cause changes in another variable; also called independent, input, or predictor variable.
Response Variable (y): Affected by changes in the explanatory variable; also called dependent or outcome variable.
Example:
Studying the impact of study hours (explanatory) on exam scores (response).
Types of Studies
Observational Study: Measures the value of the response variable without influencing explanatory or response variables. Can identify associations but not causation.
Designed Experiment: Researcher manipulates the explanatory variable and controls other variables to establish cause-and-effect relationships.
Confounding and Lurking Variables
Confounding: Occurs when the effects of two or more explanatory variables are not separated, making it unclear which variable is causing changes in the response variable.
Lurking Variable: Not considered in the study but affects the response variable.
Confounding Variable: Considered in the study, but its effect cannot be distinguished from another explanatory variable.
Example:
Studying the effect of flu shots on hospitalization rates may be confounded by age, health status, or mobility (lurking variables).
1.3 Simple Random Sampling
Random Sampling
Random Sampling: Using chance to select individuals from a population for inclusion in a sample, often without replacement.
Simple Random Sampling: Every possible sample of a particular size has an equally likely chance of being selected.
Example:
Selecting 3 friends out of 6 by drawing names from a hat is a simple random sample.
Selecting the 3 friends who live closest is a convenience sample, not random.
1.4 Other Effective Sampling Methods
Types of Sampling Methods
Stratified Sampling: Population is divided into non-overlapping groups (strata), and a simple random sample is taken from each stratum.
Systematic Sampling: Every kth individual is selected from the population, starting from a random position between 1 and k.
Cluster Sampling: All individuals within a randomly selected group (cluster) are sampled.
Convenience Sampling: Individuals are selected based on ease of access, not randomness; often leads to bias.
Sampling Methods Table
Sampling Method | Description | Example |
|---|---|---|
Simple Random | Every sample has equal chance | Randomly select individuals from a list |
Stratified | Divide into strata, sample from each | Sample from income groups |
Systematic | Select every kth individual | Every 8th chip off assembly line |
Cluster | Sample all from selected groups | All students from selected schools |
Convenience | Easy to reach, not random | Voluntary radio call-in |
1.5 Bias in Sampling
Types of Bias
Sampling Bias: Selection technique favors one part of the population; often due to convenience sampling or undercoverage.
Nonresponse Bias: Individuals selected do not respond, and their opinions differ from those who do respond.
Response Bias: Survey answers do not reflect true feelings due to interviewer error, question wording, or other factors.
Example Table: Types of Bias and Remedies
Type of Bias | Example | Remedy |
|---|---|---|
Sampling Bias | First 60 customers on Saturday | Use random sampling |
Nonresponse Bias | Only 12 of 1023 households respond | Follow up, offer incentives |
Response Bias | "How much sleep do you get?" | Careful question design |
Sampling Error vs. Non-sampling Error
Sampling Error: Sample gives incomplete information about the population.
Non-sampling Error: Includes nonresponse bias, response bias, and data-entry errors.
1.6 The Design of Experiments
Components of Experimental Design
Experiment: Controlled study to determine the effect of varying explanatory variables (factors) on a response variable.
Treatment: Any combination of values of the factors.
Experimental Unit: The person, object, or item to which a treatment is applied; called a subject if a person.
Control Group: Baseline group for comparison.
Placebo: Treatment with no therapeutic effect, used to control for psychological effects.
Placebo Effect: Perceived improvement after receiving a placebo.
Blinding
Blinding: Nondisclosure of treatment to experimental units.
Single-blind: Subject does not know treatment received.
Double-blind: Neither subject nor researcher knows treatment assignment.
Cause-and-Effect and Confounding
Designed experiments can establish cause-and-effect relationships, unlike observational studies.
Confounding can still occur in experiments but should be minimized by careful design.
Summary Table: Experimental Design Terms
Term | Definition |
|---|---|
Experimental Unit | Person/object receiving treatment |
Control Group | Baseline for comparison |
Placebo | Inert treatment |
Blinding | Concealing treatment assignment |
Confounding | Effects of variables not separated |
Additional info:
Randomization, replication, and control are key principles in experimental design to reduce bias and confounding.
Replication involves applying treatments to multiple experimental units to ensure results are not due to chance.