Statistics Fundamentals: Concepts, Data, and Probability

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 1: Introduction

Statistical vs Practical Significance

Understanding the difference between statistical and practical significance is essential in interpreting results from data analysis.

Statistical Significance: A result is statistically significant if it is unlikely to have occurred by chance under a given model (often determined by a low p-value). Large samples can make tiny, unimportant differences statistically significant.
Practical Significance: Refers to the real-world importance of a result. It considers whether the size of the effect is meaningful or valuable in context.

Parameter vs Statistic

Distinguishing between parameters and statistics is foundational in inferential statistics.

Parameter: A numerical value that describes a population (e.g., μ, p).
Statistic: A numerical value calculated from a sample (e.g., x̄, p-hat).

Data Types

Data can be classified into different types based on their nature and measurement.

Categorical (Qualitative): Labels or categories (e.g., color).
Quantitative: Numbers representing counts or measures.
Discrete: Countable values (e.g., 0, 1, 2).
Continuous: Measurements on a scale (e.g., time, weight).

Levels of Measurement

Levels of measurement determine the mathematical operations that can be performed on data.

Nominal: Names or labels only.
Ordinal: Order only (ranking).
Interval: Equal steps, no true zero (e.g., temperature in °C).
Ratio: Equal steps, true zero (e.g., length).

Sampling

Sampling methods affect the representativeness and bias of data.

SRS (Simple Random Sample): Every member has an equal chance of selection.
Systematic: Every kth member is selected.
Stratified: Sample within subgroups.
Cluster: Sample whole groups.
Convenience/Voluntary: May introduce bias.

Observational vs Experiment

Study designs differ in their ability to infer causality.

Observational: No intervention; can show association, not causation.
Experiment: Treatments assigned with randomization; can show causation.
Observational Flavors: Cross-sectional (now), Retrospective (look back), Prospective (follow forward).

Chapter 2: Tables & Graphs

Frequency, Relative Frequency, Cumulative

Tabular summaries help organize and interpret data distributions.

Choose classes: 5-10 is common.
Frequency: Count per class.
Relative frequency:
Cumulative: Running total (or running relative).

Class parts: Limits (low/high), boundaries (for continuous), midpoint

Histograms (Quantitative)

Bars touch; equal class widths.
Read shape: symmetric? skewed? center, spread, outliers.

Other Useful Plots

Dotplot / Stemplot: Small data, preserve values.
Boxplot: Five-number summary; outliers; great for comparisons.
Time series: Value vs time.
Bar/Pareto/Pie: Categorical data (bars do not touch).

Chapter 3: Center, Spread, z, Boxplots

3.1 Measures of Center (and Outliers' Effect)

Measures of center summarize the typical value in a dataset.

Mean (average): Good for symmetric data without big outliers. Outliers pull the mean toward them.
Median: Order values; take the middle (average the two middles if n is even). Resistant to outliers/skew.
Mode: Most common value(s); categorical repeated values.
Midrange:

Which to report?

Skewed/outliers: Median + IQR
Roughly symmetric: Mean + SD

3.2 Measures of Variation (Spread)

Measures of spread describe the variability in data.

Range:
Variance (Sample):
Standard Deviation (Sample):
Coefficient of Variation:

Empirical Rule (for Normal Data)

About 68% within 1 SD, 95% within 2 SD, 99.7% within 3 SD of the mean.

Chebyshev's Theorem (Any Shape, n > 1)

At least within k SDs of the mean.

Range Rule of Thumb

SD estimate:

3.3 Relative Standing & Boxplots

Relative standing and boxplots help identify outliers and compare distributions.

z-Score: (how many SDs from the mean)
|z| > 2 is unusual; |z| > 3 is very unusual.
Quartiles: (25th, 50th, 75th percentiles)
IQR:
Outliers: Fences: or

Chapter 4: Probability

4.1 Basic Concepts

Probability quantifies uncertainty and likelihood of events.

A probability is a number between 0 and 1 (0 = impossible, 1 = certain).

Three Ways to Get Probability of a Simple Event

Relative frequency (empirical): From data.
Classical (theoretical): All outcomes equally likely.
Subjective: Expert/educated judgment when data or equal-likelihood assumptions are unavailable.

Basic Concepts & Notation

Complement:
Intersection: (both happen)
Union: (at least one happens)
Conditional: (probability of A given B)

Odds of an Event

Odds in favor:
Odds against:

4.2 Addition & Multiplication Rules

Addition Rule (A or B)

General (always):

Multiplication Rule (A and B)

Independent events:
Dependent events:

Independence vs Dependence

Independent: Knowing one does not change the other (e.g., coin flips).
Dependent: Knowing one does change the other (e.g., draws without replacement).

Tiny Worked Example

Given , ,

Key Statistical Symbols & Shortcuts

Symbol	Meaning
μ	Population mean
σ, σ2	Population SD and variance
x̄, s, s2	Sample mean, sample SD, sample variance
p, p-hat	Population/sample proportion
n, N	Sample size, population size
Σ	Sum (add them all up)
Q1, Q2, Q3	Quartiles (25th, 50th, 75th percentiles)
IQR	Interquartile range (Q3 - Q1)
z	z-score ("How many SDs from the mean")
P(A)	Probability of event A
P(A')	Complement of A ("Not A")
A ∩ B	Intersection (A AND B)
A ∪ B	Union (A OR B)
P(A\|B)	Conditional probability (A given B)

Ultra-Short "Press-the-Button" List

Mean:
Range:
IQR:
Outliers: fences or
z-Score:
Probability:
- Relative frequency:
- Complement:
- Addition:
- Multiplication: (independent)
- Conditional:
Mean from frequency table: (use midpoints for grouped classes)

Additional info: Some context and definitions have been expanded for clarity and completeness.