BackThe Normal Distribution & Probability: Core Concepts for Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
The Normal Distribution & Probability
Introduction
This study guide covers foundational concepts in statistics, focusing on the normal distribution, standard deviation, sampling distributions, z-scores, and probability. These topics are essential for understanding statistical inference and data analysis in college-level statistics courses.
Standard Deviation and Variance
Why N - 1 in Sample Variance?
The sample variance is used to estimate the population variance. Variance measures the average squared deviation of each data point from the mean.
Population variance:
Sample variance:
Reason for N-1: Dividing by N-1 (instead of N) compensates for the tendency of sample means to be closer to the sample data than the population mean, preventing underestimation of variance.
Standard Deviation (SD)
Standard deviation is the square root of variance and provides a measure of spread in the same units as the data.
Formula:
Population SD:
Properties for normal distribution:
About 68% of values are within 1 SD of the mean
About 95% of values are within 2 SDs of the mean
Sampling Distributions
Concepts
Data can be represented as distributions. Each variable forms its own distribution, which can be discrete (bar chart) or continuous (density curve).
Sampling distribution: The distribution of a statistic (e.g., mean) calculated from multiple samples drawn from the same population.
Many variables tend to form normal distributions due to the Central Limit Theorem.
The Normal Distribution
Characteristics
Unimodal: Single peak
Symmetrical: Not skewed
Defined by:
Mean ()
Standard deviation () or variance ()
Notation:
Almost all values fall within 3 SDs of the mean
Area under the curve within a range gives the probability of observing values in that range
Standard Normal Distribution
Standardization and Z-Scores
Standardization: Transforming data so that the mean is 0 and SD is 1
Z-score formula:
Purpose: Allows comparison across different distributions and expresses values in terms of SDs from the mean
Important note: Standardizing does not make a non-normal distribution normal
Z-Scores: Examples and Applications
Example:
Jimmy: 75% on a test, class mean = 65%, SD = 10% (1 SD above the mean)
Jane: 70% on a test, class mean = 60%, SD = 5% (2 SD above the mean)
Translation table:
Z-score
Translation
1
1 SD above the mean
0
At the mean
-2
2 SD below the mean
Why is the Normal Distribution Important?
Central Limit Theorem
If we take multiple random samples from a population, the distribution of sample means tends to be normal, even if the original data is not normal.
Central Limit Theorem: For large sample sizes, the sampling distribution of the mean is approximately normal.
Statistical Tests and Assumptions
Many statistical tests assume normality of variables, errors, or sample means.
Parametric tests: Assume specific distributions (e.g., T-test, ANOVA)
Non-parametric tests: Do not assume specific distributions (e.g., Chi-squared)
Probability
Basic Concepts
Probability: Likelihood of an event occurring, ranging from 0 (impossible) to 1 (certain)
Total probability: For all possible outcomes (exhaustive), probabilities must sum to 1
Mutually exclusive events: Events that cannot occur together (e.g., heads and tails in a coin flip)
Examples
Biased coin: heads (0.45), tails (0.53), edge (0.02)
TikTok videos: 70% dance challenges, 15% exploding packages, 10% dogs, 5% other
If you watch 50 videos, the expected number about dogs is
Combining Probabilities
Types of Probability
Mutually exclusive:
Joint probability: (if independent)
Conditional probability: is the probability of A given B has occurred
Probability and Distributions
Discrete vs. Continuous Variables
Discrete variables: Probability is the sum of probabilities for each bar (e.g., )
Continuous variables: Probability is the area under the curve for a given range (e.g., )
Often interested in the probability of extreme values (tails)
Tables
Purpose and Use
Tables provide pre-calculated areas under the normal curve for ranges of (standard normal distribution)
Other distributions have similar tables, usually standardized to z-scores
Modern statistical software (e.g., JASP) can compute these values directly
Z | Area to left |
|---|---|
0.0 | 0.5000 |
1.0 | 0.8413 |
2.0 | 0.9772 |
-1.0 | 0.1587 |
-2.0 | 0.0228 |
Additional info: | Values inferred for illustration; actual tables are more detailed. |
Summary
Understanding the normal distribution, standard deviation, z-scores, and probability is crucial for statistical analysis. These concepts underpin many statistical tests and methods used in research and data science.