BackStatistics Fundamentals: Concepts, Data, and Probability
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 1: Introduction
Statistical vs Practical Significance
Understanding the difference between statistical and practical significance is essential in interpreting results from data analysis.
Statistical Significance: A result is statistically significant if it is unlikely to have occurred by chance under a given model (often determined by a low p-value). Large samples can make tiny, unimportant differences statistically significant.
Practical Significance: Refers to the real-world importance of a result. It considers whether the size of the effect is meaningful or valuable in context.
Parameter vs Statistic
Distinguishing between parameters and statistics is foundational in inferential statistics.
Parameter: A numerical value that describes a population (e.g., μ, p).
Statistic: A numerical value calculated from a sample (e.g., x̄, p-hat).
Data Types
Data can be classified into different types based on their nature and measurement.
Categorical (Qualitative): Labels or categories (e.g., color).
Quantitative: Numbers representing counts or measures.
Discrete: Countable values (e.g., 0, 1, 2).
Continuous: Measurements on a scale (e.g., time, weight).
Levels of Measurement
Levels of measurement determine the mathematical operations that can be performed on data.
Nominal: Names or labels only.
Ordinal: Order only (ranking).
Interval: Equal steps, no true zero (e.g., temperature in °C).
Ratio: Equal steps, true zero (e.g., length).
Sampling
Sampling methods affect the representativeness and bias of data.
SRS (Simple Random Sample): Every member has an equal chance of selection.
Systematic: Every kth member is selected.
Stratified: Sample within subgroups.
Cluster: Sample whole groups.
Convenience/Voluntary: May introduce bias.
Observational vs Experiment
Study designs differ in their ability to infer causality.
Observational: No intervention; can show association, not causation.
Experiment: Treatments assigned with randomization; can show causation.
Observational Flavors: Cross-sectional (now), Retrospective (look back), Prospective (follow forward).
Chapter 2: Tables & Graphs
Frequency, Relative Frequency, Cumulative
Tabular summaries help organize and interpret data distributions.
Choose classes: 5-10 is common.
Frequency: Count per class.
Relative frequency:
Cumulative: Running total (or running relative).
Class parts: Limits (low/high), boundaries (for continuous), midpoint
Histograms (Quantitative)
Bars touch; equal class widths.
Read shape: symmetric? skewed? center, spread, outliers.
Other Useful Plots
Dotplot / Stemplot: Small data, preserve values.
Boxplot: Five-number summary; outliers; great for comparisons.
Time series: Value vs time.
Bar/Pareto/Pie: Categorical data (bars do not touch).
Chapter 3: Center, Spread, z, Boxplots
3.1 Measures of Center (and Outliers' Effect)
Measures of center summarize the typical value in a dataset.
Mean (average): Good for symmetric data without big outliers. Outliers pull the mean toward them.
Median: Order values; take the middle (average the two middles if n is even). Resistant to outliers/skew.
Mode: Most common value(s); categorical repeated values.
Midrange:
Which to report?
Skewed/outliers: Median + IQR
Roughly symmetric: Mean + SD
3.2 Measures of Variation (Spread)
Measures of spread describe the variability in data.
Range:
Variance (Sample):
Standard Deviation (Sample):
Coefficient of Variation:
Empirical Rule (for Normal Data)
About 68% within 1 SD, 95% within 2 SD, 99.7% within 3 SD of the mean.
Chebyshev's Theorem (Any Shape, n > 1)
At least within k SDs of the mean.
Range Rule of Thumb
SD estimate:
3.3 Relative Standing & Boxplots
Relative standing and boxplots help identify outliers and compare distributions.
z-Score: (how many SDs from the mean)
|z| > 2 is unusual; |z| > 3 is very unusual.
Quartiles: (25th, 50th, 75th percentiles)
IQR:
Outliers: Fences: or
Chapter 4: Probability
4.1 Basic Concepts
Probability quantifies uncertainty and likelihood of events.
A probability is a number between 0 and 1 (0 = impossible, 1 = certain).
Three Ways to Get Probability of a Simple Event
Relative frequency (empirical): From data.
Classical (theoretical): All outcomes equally likely.
Subjective: Expert/educated judgment when data or equal-likelihood assumptions are unavailable.
Basic Concepts & Notation
Complement:
Intersection: (both happen)
Union: (at least one happens)
Conditional: (probability of A given B)
Odds of an Event
Odds in favor:
Odds against:
4.2 Addition & Multiplication Rules
Addition Rule (A or B)
General (always):
Multiplication Rule (A and B)
Independent events:
Dependent events:
Independence vs Dependence
Independent: Knowing one does not change the other (e.g., coin flips).
Dependent: Knowing one does change the other (e.g., draws without replacement).
Tiny Worked Example
Given , ,
Key Statistical Symbols & Shortcuts
Symbol | Meaning |
|---|---|
μ | Population mean |
σ, σ2 | Population SD and variance |
x̄, s, s2 | Sample mean, sample SD, sample variance |
p, p-hat | Population/sample proportion |
n, N | Sample size, population size |
Σ | Sum (add them all up) |
Q1, Q2, Q3 | Quartiles (25th, 50th, 75th percentiles) |
IQR | Interquartile range (Q3 - Q1) |
z | z-score ("How many SDs from the mean") |
P(A) | Probability of event A |
P(A') | Complement of A ("Not A") |
A ∩ B | Intersection (A AND B) |
A ∪ B | Union (A OR B) |
P(A|B) | Conditional probability (A given B) |
Ultra-Short "Press-the-Button" List
Mean:
Range:
IQR:
Outliers: fences or
z-Score:
Probability:
Relative frequency:
Complement:
Addition:
Multiplication: (independent)
Conditional:
Mean from frequency table: (use midpoints for grouped classes)
Additional info: Some context and definitions have been expanded for clarity and completeness.