Introduction to the Practice of Statistics: Data Collection and Experimental Design

Notes Practice Video lessons

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

1.1 Introduction to the Practice of Statistics

Definition and Scope of Statistics

Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions.
Statistics provides a measure of confidence in any conclusions drawn from data.

Key Terms

Data: Facts or figures from which conclusions can be drawn; information.
Population: The entire group that is being studied.
Individual: A person or object that is a member of the population.
Sample: A subset of the population.

Branches of Statistics

Descriptive Statistics: Organizing, summarizing, and displaying data using numerical summaries, tables, and graphs.
Inferential Statistics: Using methods that take results from a sample, extend them to the population, and measure the reliability of the result.

Parameters and Statistics

Parameter: A numerical summary of a population.
Statistic: A numerical summary of a sample.

Example:

Suppose 48.2% of all students on campus own a car (parameter). A survey of 100 students finds 46% own a car (statistic).

Variables

Variables: Characteristics of individuals within the population.
Qualitative (Categorical) Variables: Classify individuals based on attributes or characteristics (e.g., gender, area code).
Quantitative Variables: Provide numerical measures of individuals (e.g., temperature, number of study days).

Types of Quantitative Variables

Discrete Variable: Has a finite or countable number of possible values (e.g., number of heads in coin flips).
Continuous Variable: Has an infinite number of possible values, not countable, can take any value in an interval (e.g., distance traveled).

1.2 Observational Studies vs. Designed Experiments

Variables in Studies

Explanatory Variable (x): Thought to influence or cause changes in another variable; also called independent, input, or predictor variable.
Response Variable (y): Affected by changes in the explanatory variable; also called dependent or outcome variable.

Example:

Studying the impact of study hours (explanatory) on exam scores (response).

Types of Studies

Observational Study: Measures the value of the response variable without influencing explanatory or response variables. Can identify associations but not causation.
Designed Experiment: Researcher manipulates the explanatory variable and controls other variables to establish cause-and-effect relationships.

Confounding and Lurking Variables

Confounding: Occurs when the effects of two or more explanatory variables are not separated, making it unclear which variable is causing changes in the response variable.
Lurking Variable: Not considered in the study but affects the response variable.
Confounding Variable: Considered in the study, but its effect cannot be distinguished from another explanatory variable.

Example:

Studying the effect of flu shots on hospitalization rates may be confounded by age, health status, or mobility (lurking variables).

1.3 Simple Random Sampling

Random Sampling

Random Sampling: Using chance to select individuals from a population for inclusion in a sample, often without replacement.
Simple Random Sampling: Every possible sample of a particular size has an equally likely chance of being selected.

Example:

Selecting 3 friends out of 6 by drawing names from a hat is a simple random sample.
Selecting the 3 friends who live closest is a convenience sample, not random.

1.4 Other Effective Sampling Methods

Types of Sampling Methods

Stratified Sampling: Population is divided into non-overlapping groups (strata), and a simple random sample is taken from each stratum.
Systematic Sampling: Every kth individual is selected from the population, starting from a random position between 1 and k.
Cluster Sampling: All individuals within a randomly selected group (cluster) are sampled.
Convenience Sampling: Individuals are selected based on ease of access, not randomness; often leads to bias.

Sampling Methods Table

Sampling Method	Description	Example
Simple Random	Every sample has equal chance	Randomly select individuals from a list
Stratified	Divide into strata, sample from each	Sample from income groups
Systematic	Select every kth individual	Every 8th chip off assembly line
Cluster	Sample all from selected groups	All students from selected schools
Convenience	Easy to reach, not random	Voluntary radio call-in

1.5 Bias in Sampling

Types of Bias

Sampling Bias: Selection technique favors one part of the population; often due to convenience sampling or undercoverage.
Nonresponse Bias: Individuals selected do not respond, and their opinions differ from those who do respond.
Response Bias: Survey answers do not reflect true feelings due to interviewer error, question wording, or other factors.

Example Table: Types of Bias and Remedies

Type of Bias	Example	Remedy
Sampling Bias	First 60 customers on Saturday	Use random sampling
Nonresponse Bias	Only 12 of 1023 households respond	Follow up, offer incentives
Response Bias	"How much sleep do you get?"	Careful question design

Sampling Error vs. Non-sampling Error

Sampling Error: Sample gives incomplete information about the population.
Non-sampling Error: Includes nonresponse bias, response bias, and data-entry errors.

1.6 The Design of Experiments

Components of Experimental Design

Experiment: Controlled study to determine the effect of varying explanatory variables (factors) on a response variable.
Treatment: Any combination of values of the factors.
Experimental Unit: The person, object, or item to which a treatment is applied; called a subject if a person.
Control Group: Baseline group for comparison.
Placebo: Treatment with no therapeutic effect, used to control for psychological effects.
Placebo Effect: Perceived improvement after receiving a placebo.

Blinding

Blinding: Nondisclosure of treatment to experimental units.
Single-blind: Subject does not know treatment received.
Double-blind: Neither subject nor researcher knows treatment assignment.

Cause-and-Effect and Confounding

Designed experiments can establish cause-and-effect relationships, unlike observational studies.
Confounding can still occur in experiments but should be minimized by careful design.

Summary Table: Experimental Design Terms

Term	Definition
Experimental Unit	Person/object receiving treatment
Control Group	Baseline for comparison
Placebo	Inert treatment
Blinding	Concealing treatment assignment
Confounding	Effects of variables not separated

Additional info:

Randomization, replication, and control are key principles in experimental design to reduce bias and confounding.
Replication involves applying treatments to multiple experimental units to ensure results are not due to chance.