Skip to main content
Back

Introduction to the Practice of Statistics: Data Collection and Experimental Design

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

1.1 Introduction to the Practice of Statistics

Definition and Scope of Statistics

  • Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions.

  • Statistics provides a measure of confidence in any conclusions drawn from data.

Key Terms

  • Data: Facts or figures from which conclusions can be drawn; information.

  • Population: The entire group that is being studied.

  • Individual: A person or object that is a member of the population.

  • Sample: A subset of the population.

Branches of Statistics

  • Descriptive Statistics: Organizing, summarizing, and displaying data using numerical summaries, tables, and graphs.

  • Inferential Statistics: Using methods that take results from a sample, extend them to the population, and measure the reliability of the result.

Parameters and Statistics

  • Parameter: A numerical summary of a population.

  • Statistic: A numerical summary of a sample.

Example:

  • Suppose 48.2% of all students on campus own a car (parameter). A survey of 100 students finds 46% own a car (statistic).

Variables

  • Variables: Characteristics of individuals within the population.

  • Qualitative (Categorical) Variables: Classify individuals based on attributes or characteristics (e.g., gender, area code).

  • Quantitative Variables: Provide numerical measures of individuals (e.g., temperature, number of study days).

Types of Quantitative Variables

  • Discrete Variable: Has a finite or countable number of possible values (e.g., number of heads in coin flips).

  • Continuous Variable: Has an infinite number of possible values, not countable, can take any value in an interval (e.g., distance traveled).

1.2 Observational Studies vs. Designed Experiments

Variables in Studies

  • Explanatory Variable (x): Thought to influence or cause changes in another variable; also called independent, input, or predictor variable.

  • Response Variable (y): Affected by changes in the explanatory variable; also called dependent or outcome variable.

Example:

  • Studying the impact of study hours (explanatory) on exam scores (response).

Types of Studies

  • Observational Study: Measures the value of the response variable without influencing explanatory or response variables. Can identify associations but not causation.

  • Designed Experiment: Researcher manipulates the explanatory variable and controls other variables to establish cause-and-effect relationships.

Confounding and Lurking Variables

  • Confounding: Occurs when the effects of two or more explanatory variables are not separated, making it unclear which variable is causing changes in the response variable.

  • Lurking Variable: Not considered in the study but affects the response variable.

  • Confounding Variable: Considered in the study, but its effect cannot be distinguished from another explanatory variable.

Example:

  • Studying the effect of flu shots on hospitalization rates may be confounded by age, health status, or mobility (lurking variables).

1.3 Simple Random Sampling

Random Sampling

  • Random Sampling: Using chance to select individuals from a population for inclusion in a sample, often without replacement.

  • Simple Random Sampling: Every possible sample of a particular size has an equally likely chance of being selected.

Example:

  • Selecting 3 friends out of 6 by drawing names from a hat is a simple random sample.

  • Selecting the 3 friends who live closest is a convenience sample, not random.

1.4 Other Effective Sampling Methods

Types of Sampling Methods

  • Stratified Sampling: Population is divided into non-overlapping groups (strata), and a simple random sample is taken from each stratum.

  • Systematic Sampling: Every kth individual is selected from the population, starting from a random position between 1 and k.

  • Cluster Sampling: All individuals within a randomly selected group (cluster) are sampled.

  • Convenience Sampling: Individuals are selected based on ease of access, not randomness; often leads to bias.

Sampling Methods Table

Sampling Method

Description

Example

Simple Random

Every sample has equal chance

Randomly select individuals from a list

Stratified

Divide into strata, sample from each

Sample from income groups

Systematic

Select every kth individual

Every 8th chip off assembly line

Cluster

Sample all from selected groups

All students from selected schools

Convenience

Easy to reach, not random

Voluntary radio call-in

1.5 Bias in Sampling

Types of Bias

  • Sampling Bias: Selection technique favors one part of the population; often due to convenience sampling or undercoverage.

  • Nonresponse Bias: Individuals selected do not respond, and their opinions differ from those who do respond.

  • Response Bias: Survey answers do not reflect true feelings due to interviewer error, question wording, or other factors.

Example Table: Types of Bias and Remedies

Type of Bias

Example

Remedy

Sampling Bias

First 60 customers on Saturday

Use random sampling

Nonresponse Bias

Only 12 of 1023 households respond

Follow up, offer incentives

Response Bias

"How much sleep do you get?"

Careful question design

Sampling Error vs. Non-sampling Error

  • Sampling Error: Sample gives incomplete information about the population.

  • Non-sampling Error: Includes nonresponse bias, response bias, and data-entry errors.

1.6 The Design of Experiments

Components of Experimental Design

  • Experiment: Controlled study to determine the effect of varying explanatory variables (factors) on a response variable.

  • Treatment: Any combination of values of the factors.

  • Experimental Unit: The person, object, or item to which a treatment is applied; called a subject if a person.

  • Control Group: Baseline group for comparison.

  • Placebo: Treatment with no therapeutic effect, used to control for psychological effects.

  • Placebo Effect: Perceived improvement after receiving a placebo.

Blinding

  • Blinding: Nondisclosure of treatment to experimental units.

  • Single-blind: Subject does not know treatment received.

  • Double-blind: Neither subject nor researcher knows treatment assignment.

Cause-and-Effect and Confounding

  • Designed experiments can establish cause-and-effect relationships, unlike observational studies.

  • Confounding can still occur in experiments but should be minimized by careful design.

Summary Table: Experimental Design Terms

Term

Definition

Experimental Unit

Person/object receiving treatment

Control Group

Baseline for comparison

Placebo

Inert treatment

Blinding

Concealing treatment assignment

Confounding

Effects of variables not separated

Additional info:

  • Randomization, replication, and control are key principles in experimental design to reduce bias and confounding.

  • Replication involves applying treatments to multiple experimental units to ensure results are not due to chance.

Pearson Logo

Study Prep