Skip to main content
Back

The Normal Distribution & Probability: Core Concepts for Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

The Normal Distribution & Probability

Introduction

This study guide covers foundational concepts in statistics, focusing on the normal distribution, standard deviation, sampling distributions, z-scores, and probability. These topics are essential for understanding statistical inference and data analysis in college-level statistics courses.

Standard Deviation and Variance

Why N - 1 in Sample Variance?

The sample variance is used to estimate the population variance. Variance measures the average squared deviation of each data point from the mean.

  • Population variance:

  • Sample variance:

  • Reason for N-1: Dividing by N-1 (instead of N) compensates for the tendency of sample means to be closer to the sample data than the population mean, preventing underestimation of variance.

Standard Deviation (SD)

Standard deviation is the square root of variance and provides a measure of spread in the same units as the data.

  • Formula:

  • Population SD:

  • Properties for normal distribution:

    • About 68% of values are within 1 SD of the mean

    • About 95% of values are within 2 SDs of the mean

Sampling Distributions

Concepts

Data can be represented as distributions. Each variable forms its own distribution, which can be discrete (bar chart) or continuous (density curve).

  • Sampling distribution: The distribution of a statistic (e.g., mean) calculated from multiple samples drawn from the same population.

  • Many variables tend to form normal distributions due to the Central Limit Theorem.

The Normal Distribution

Characteristics

  • Unimodal: Single peak

  • Symmetrical: Not skewed

  • Defined by:

    • Mean ()

    • Standard deviation () or variance ()

  • Notation:

  • Almost all values fall within 3 SDs of the mean

  • Area under the curve within a range gives the probability of observing values in that range

Standard Normal Distribution

Standardization and Z-Scores

  • Standardization: Transforming data so that the mean is 0 and SD is 1

  • Z-score formula:

  • Purpose: Allows comparison across different distributions and expresses values in terms of SDs from the mean

  • Important note: Standardizing does not make a non-normal distribution normal

Z-Scores: Examples and Applications

  • Example:

    • Jimmy: 75% on a test, class mean = 65%, SD = 10% (1 SD above the mean)

    • Jane: 70% on a test, class mean = 60%, SD = 5% (2 SD above the mean)

  • Translation table:

    Z-score

    Translation

    1

    1 SD above the mean

    0

    At the mean

    -2

    2 SD below the mean

Why is the Normal Distribution Important?

Central Limit Theorem

  • If we take multiple random samples from a population, the distribution of sample means tends to be normal, even if the original data is not normal.

  • Central Limit Theorem: For large sample sizes, the sampling distribution of the mean is approximately normal.

Statistical Tests and Assumptions

  • Many statistical tests assume normality of variables, errors, or sample means.

  • Parametric tests: Assume specific distributions (e.g., T-test, ANOVA)

  • Non-parametric tests: Do not assume specific distributions (e.g., Chi-squared)

Probability

Basic Concepts

  • Probability: Likelihood of an event occurring, ranging from 0 (impossible) to 1 (certain)

  • Total probability: For all possible outcomes (exhaustive), probabilities must sum to 1

  • Mutually exclusive events: Events that cannot occur together (e.g., heads and tails in a coin flip)

Examples

  • Biased coin: heads (0.45), tails (0.53), edge (0.02)

  • TikTok videos: 70% dance challenges, 15% exploding packages, 10% dogs, 5% other

  • If you watch 50 videos, the expected number about dogs is

Combining Probabilities

Types of Probability

  • Mutually exclusive:

  • Joint probability: (if independent)

  • Conditional probability: is the probability of A given B has occurred

Probability and Distributions

Discrete vs. Continuous Variables

  • Discrete variables: Probability is the sum of probabilities for each bar (e.g., )

  • Continuous variables: Probability is the area under the curve for a given range (e.g., )

  • Often interested in the probability of extreme values (tails)

Tables

Purpose and Use

  • Tables provide pre-calculated areas under the normal curve for ranges of (standard normal distribution)

  • Other distributions have similar tables, usually standardized to z-scores

  • Modern statistical software (e.g., JASP) can compute these values directly

Z

Area to left

0.0

0.5000

1.0

0.8413

2.0

0.9772

-1.0

0.1587

-2.0

0.0228

Additional info:

Values inferred for illustration; actual tables are more detailed.

Summary

Understanding the normal distribution, standard deviation, z-scores, and probability is crucial for statistical analysis. These concepts underpin many statistical tests and methods used in research and data science.

Pearson Logo

Study Prep