BackChapter 1: Introduction to Statistics – Key Concepts and Methods
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Introduction to Statistics
What is Statistics?
Statistics is the science of collecting, organizing, analyzing, and interpreting data to draw conclusions and make informed decisions.
Collecting data: Gathering information through measurements, surveys, or experiments.
Organizing & summarizing data: Using tables, charts, and graphs to present data clearly.
Analyzing data: Applying statistical methods to extract meaningful patterns.
Interpreting results: Drawing logical conclusions based on data analysis.
The Statistical Process
Three Main Steps
Prepare: Define the context, goals, and data collection methods.
Analyze: Use graphs, charts, and statistical techniques to examine data.
Conclude: Interpret results, assess significance, and draw conclusions.
Statistical thinking involves using logic and common sense, not just mathematical calculations.
Populations and Samples
Definitions
Population: The entire group you want information about.
Sample: A smaller group selected from the population for study.
Census vs. Sample
Census: Data from every member of the population.
Sample: Data from only part of the population (often used due to cost or time constraints).
Types of Data
Quantitative Data (Numerical)
Discrete Data: Countable values (e.g., number of students).
Continuous Data: Measured values that can include decimals (e.g., height, weight, time).
Categorical Data (Qualitative)
Non-numerical measurements (e.g., gender, eye color, survey responses).
Levels of Measurement
Nominal: Categories only, no order (e.g., gender, eye color).
Ordinal: Categories with order, but differences are not meaningful (e.g., letter grades).
Interval: Ordered data, differences are meaningful, but no true zero (e.g., temperature in °C).
Ratio: Ordered data, differences are meaningful, true zero exists (e.g., height, age).
Quick Level Summary: Nominal = categories; Ordinal = categories + order; Interval = differences, no zero; Ratio = differences + true zero.
Big Data
Extremely large and complex data sets.
Requires advanced software and data science techniques.
Applications: Computer science, business, healthcare, etc.
Missing Data
Types of Missing Data
Missing Completely at Random: Missingness is unrelated to any variable.
Missing Not at Random: Missingness is related to the reason it is missing.
Handling Missing Data
Delete cases: Remove records with missing values.
Impute values: Replace missing values with estimated values.
Parameters vs. Statistics
Parameter: Numerical value that describes a population.
Statistic: Numerical value that describes a sample.
The Gold Standard in Experiments
Random assignment to treatment and placebo groups is called the gold standard.
Placebo: An inactive treatment (like a sugar pill) used to compare real effects vs. psychological effects.
Ways to Collect Data
Observational Studies: Observe and measure without changing anything.
Experiments: Apply a treatment and observe its effects.
Design of Experiments
Replication: Use enough subjects to see real effects.
Blinding: Subjects do not know their treatment group.
Double-blind: Neither subjects nor researchers know group assignments.
Randomization: Use chance to assign subjects to groups.
Sampling Methods
Simple Random Sample: Every possible sample has an equal chance.
Systematic Sampling: Choose every nth item.
Convenience Sampling: Use data that is easy to obtain.
Stratified Sampling: Divide into similar groups and sample from each.
Cluster Sampling: Divide into clusters, randomly select clusters, and sample all members.
Multistage Sampling: Combine multiple sampling methods in stages.
Types of Observational Studies
Cross-sectional: Data collected at one point in time.
Retrospective: Looks back at past data.
Prospective: Follows groups into the future.
Confounding and Controlling Variables
Confounding: Occurs when you cannot identify the true cause of an effect.
Controlling Variables: Good experimental design helps reduce confounding.
Completely Randomized Design: Randomly assign subjects to treatments.
Randomized Block Design: Group similar subjects, then randomize within blocks.
Matched Pairs Design: Match subjects in pairs and compare treatments.
Rigorously Controlled Design: Carefully balance groups (very difficult).
Sampling Errors
Sampling Error: Differences between sample result and true population value due to chance.
Nonsampling Error: Human mistakes (biased questions, wrong data, etc.).
Nonsampling Error: Errors from nonrandom methods like convenience sampling.
Statistical and Practical Significance
Statistical Significance: Results are unlikely due to chance (usually means probability < 5%).
Practical Significance: Asks if the result matters in real life, not just statistically.
Example: Losing 2.1 kg in a year may be statistically significant but not worth the effort for many people.
Common Pitfalls in Statistics
Misleading conclusions
Self-reported data
Order of questions
Nonresponse
Missing data
Misleading percentages
Key Takeaways (Exam Ready)
Statistics is more than calculations; it requires critical thinking.
Samples should represent the population.
Voluntary response samples are biased.
Statistical significance ≠ practical importance.
Always question how data were collected.