Skip to main content
Back

Statistics Fundamentals: Concepts, Data, and Probability

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 1: Introduction

Statistical vs Practical Significance

Understanding the difference between statistical and practical significance is essential in interpreting results from data analysis.

  • Statistical Significance: A result is statistically significant if it is unlikely to have occurred by chance under a given model (often determined by a low p-value). Large samples can make tiny, unimportant differences statistically significant.

  • Practical Significance: Refers to the real-world importance of a result. It considers whether the size of the effect is meaningful or valuable in context.

Parameter vs Statistic

Distinguishing between parameters and statistics is foundational in inferential statistics.

  • Parameter: A numerical value that describes a population (e.g., μ, p).

  • Statistic: A numerical value calculated from a sample (e.g., x̄, p-hat).

Data Types

Data can be classified into different types based on their nature and measurement.

  • Categorical (Qualitative): Labels or categories (e.g., color).

  • Quantitative: Numbers representing counts or measures.

  • Discrete: Countable values (e.g., 0, 1, 2).

  • Continuous: Measurements on a scale (e.g., time, weight).

Levels of Measurement

Levels of measurement determine the mathematical operations that can be performed on data.

  • Nominal: Names or labels only.

  • Ordinal: Order only (ranking).

  • Interval: Equal steps, no true zero (e.g., temperature in °C).

  • Ratio: Equal steps, true zero (e.g., length).

Sampling

Sampling methods affect the representativeness and bias of data.

  • SRS (Simple Random Sample): Every member has an equal chance of selection.

  • Systematic: Every kth member is selected.

  • Stratified: Sample within subgroups.

  • Cluster: Sample whole groups.

  • Convenience/Voluntary: May introduce bias.

Observational vs Experiment

Study designs differ in their ability to infer causality.

  • Observational: No intervention; can show association, not causation.

  • Experiment: Treatments assigned with randomization; can show causation.

  • Observational Flavors: Cross-sectional (now), Retrospective (look back), Prospective (follow forward).

Chapter 2: Tables & Graphs

Frequency, Relative Frequency, Cumulative

Tabular summaries help organize and interpret data distributions.

  1. Choose classes: 5-10 is common.

  2. Frequency: Count per class.

  3. Relative frequency:

  4. Cumulative: Running total (or running relative).

Class parts: Limits (low/high), boundaries (for continuous), midpoint

Histograms (Quantitative)

  • Bars touch; equal class widths.

  • Read shape: symmetric? skewed? center, spread, outliers.

Other Useful Plots

  • Dotplot / Stemplot: Small data, preserve values.

  • Boxplot: Five-number summary; outliers; great for comparisons.

  • Time series: Value vs time.

  • Bar/Pareto/Pie: Categorical data (bars do not touch).

Chapter 3: Center, Spread, z, Boxplots

3.1 Measures of Center (and Outliers' Effect)

Measures of center summarize the typical value in a dataset.

  • Mean (average): Good for symmetric data without big outliers. Outliers pull the mean toward them.

  • Median: Order values; take the middle (average the two middles if n is even). Resistant to outliers/skew.

  • Mode: Most common value(s); categorical repeated values.

  • Midrange:

Which to report?

  • Skewed/outliers: Median + IQR

  • Roughly symmetric: Mean + SD

3.2 Measures of Variation (Spread)

Measures of spread describe the variability in data.

  • Range:

  • Variance (Sample):

  • Standard Deviation (Sample):

  • Coefficient of Variation:

Empirical Rule (for Normal Data)

  • About 68% within 1 SD, 95% within 2 SD, 99.7% within 3 SD of the mean.

Chebyshev's Theorem (Any Shape, n > 1)

  • At least within k SDs of the mean.

Range Rule of Thumb

  • SD estimate:

3.3 Relative Standing & Boxplots

Relative standing and boxplots help identify outliers and compare distributions.

  • z-Score: (how many SDs from the mean)

  • |z| > 2 is unusual; |z| > 3 is very unusual.

  • Quartiles: (25th, 50th, 75th percentiles)

  • IQR:

  • Outliers: Fences: or

Chapter 4: Probability

4.1 Basic Concepts

Probability quantifies uncertainty and likelihood of events.

  • A probability is a number between 0 and 1 (0 = impossible, 1 = certain).

Three Ways to Get Probability of a Simple Event

  1. Relative frequency (empirical): From data.

  2. Classical (theoretical): All outcomes equally likely.

  3. Subjective: Expert/educated judgment when data or equal-likelihood assumptions are unavailable.

Basic Concepts & Notation

  • Complement:

  • Intersection: (both happen)

  • Union: (at least one happens)

  • Conditional: (probability of A given B)

Odds of an Event

  • Odds in favor:

  • Odds against:

4.2 Addition & Multiplication Rules

Addition Rule (A or B)

  • General (always):

Multiplication Rule (A and B)

  • Independent events:

  • Dependent events:

Independence vs Dependence

  • Independent: Knowing one does not change the other (e.g., coin flips).

  • Dependent: Knowing one does change the other (e.g., draws without replacement).

Tiny Worked Example

  • Given , ,

Key Statistical Symbols & Shortcuts

Symbol

Meaning

μ

Population mean

σ, σ2

Population SD and variance

x̄, s, s2

Sample mean, sample SD, sample variance

p, p-hat

Population/sample proportion

n, N

Sample size, population size

Σ

Sum (add them all up)

Q1, Q2, Q3

Quartiles (25th, 50th, 75th percentiles)

IQR

Interquartile range (Q3 - Q1)

z

z-score ("How many SDs from the mean")

P(A)

Probability of event A

P(A')

Complement of A ("Not A")

A ∩ B

Intersection (A AND B)

A ∪ B

Union (A OR B)

P(A|B)

Conditional probability (A given B)

Ultra-Short "Press-the-Button" List

  • Mean:

  • Range:

  • IQR:

  • Outliers: fences or

  • z-Score:

  • Probability:

    • Relative frequency:

    • Complement:

    • Addition:

    • Multiplication: (independent)

    • Conditional:

  • Mean from frequency table: (use midpoints for grouped classes)

Additional info: Some context and definitions have been expanded for clarity and completeness.

Pearson Logo

Study Prep