Chapter 1: Introduction to Statistics – Key Concepts and Methods

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

What is Statistics?

Statistics is the science of collecting, organizing, analyzing, and interpreting data to draw conclusions and make informed decisions.

Collecting data: Gathering information through measurements, surveys, or experiments.
Organizing & summarizing data: Using tables, charts, and graphs to present data clearly.
Analyzing data: Applying statistical methods to extract meaningful patterns.
Interpreting results: Drawing logical conclusions based on data analysis.

The Statistical Process

Three Main Steps

Prepare: Define the context, goals, and data collection methods.
Analyze: Use graphs, charts, and statistical techniques to examine data.
Conclude: Interpret results, assess significance, and draw conclusions.

Statistical thinking involves using logic and common sense, not just mathematical calculations.

Populations and Samples

Definitions

Population: The entire group you want information about.
Sample: A smaller group selected from the population for study.

Census vs. Sample

Census: Data from every member of the population.
Sample: Data from only part of the population (often used due to cost or time constraints).

Types of Data

Quantitative Data (Numerical)

Discrete Data: Countable values (e.g., number of students).
Continuous Data: Measured values that can include decimals (e.g., height, weight, time).

Categorical Data (Qualitative)

Non-numerical measurements (e.g., gender, eye color, survey responses).

Levels of Measurement

Nominal: Categories only, no order (e.g., gender, eye color).
Ordinal: Categories with order, but differences are not meaningful (e.g., letter grades).
Interval: Ordered data, differences are meaningful, but no true zero (e.g., temperature in °C).
Ratio: Ordered data, differences are meaningful, true zero exists (e.g., height, age).

Quick Level Summary: Nominal = categories; Ordinal = categories + order; Interval = differences, no zero; Ratio = differences + true zero.

Big Data

Extremely large and complex data sets.
Requires advanced software and data science techniques.
Applications: Computer science, business, healthcare, etc.

Missing Data

Types of Missing Data

Missing Completely at Random: Missingness is unrelated to any variable.
Missing Not at Random: Missingness is related to the reason it is missing.

Handling Missing Data

Delete cases: Remove records with missing values.
Impute values: Replace missing values with estimated values.

Parameters vs. Statistics

Parameter: Numerical value that describes a population.
Statistic: Numerical value that describes a sample.

The Gold Standard in Experiments

Random assignment to treatment and placebo groups is called the gold standard.
Placebo: An inactive treatment (like a sugar pill) used to compare real effects vs. psychological effects.

Ways to Collect Data

Observational Studies: Observe and measure without changing anything.
Experiments: Apply a treatment and observe its effects.

Design of Experiments

Replication: Use enough subjects to see real effects.
Blinding: Subjects do not know their treatment group.
Double-blind: Neither subjects nor researchers know group assignments.
Randomization: Use chance to assign subjects to groups.

Sampling Methods

Simple Random Sample: Every possible sample has an equal chance.
Systematic Sampling: Choose every nth item.
Convenience Sampling: Use data that is easy to obtain.
Stratified Sampling: Divide into similar groups and sample from each.
Cluster Sampling: Divide into clusters, randomly select clusters, and sample all members.
Multistage Sampling: Combine multiple sampling methods in stages.

Types of Observational Studies

Cross-sectional: Data collected at one point in time.
Retrospective: Looks back at past data.
Prospective: Follows groups into the future.

Confounding and Controlling Variables

Confounding: Occurs when you cannot identify the true cause of an effect.
Controlling Variables: Good experimental design helps reduce confounding.

Completely Randomized Design: Randomly assign subjects to treatments.
Randomized Block Design: Group similar subjects, then randomize within blocks.
Matched Pairs Design: Match subjects in pairs and compare treatments.
Rigorously Controlled Design: Carefully balance groups (very difficult).

Sampling Errors

Sampling Error: Differences between sample result and true population value due to chance.
Nonsampling Error: Human mistakes (biased questions, wrong data, etc.).
Nonsampling Error: Errors from nonrandom methods like convenience sampling.

Statistical and Practical Significance

Statistical Significance: Results are unlikely due to chance (usually means probability < 5%).
Practical Significance: Asks if the result matters in real life, not just statistically.

Example: Losing 2.1 kg in a year may be statistically significant but not worth the effort for many people.

Common Pitfalls in Statistics

Misleading conclusions
Self-reported data
Order of questions
Nonresponse
Missing data
Misleading percentages

Key Takeaways (Exam Ready)

Statistics is more than calculations; it requires critical thinking.
Samples should represent the population.
Voluntary response samples are biased.
Statistical significance ≠ practical importance.
Always question how data were collected.