BackModule 3: Normal Distributions & Sampling Distributions – Study Notes
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Overview of Module 3: Normal Distributions & Sampling Distributions
This module introduces students to continuous probability distributions, focusing on the normal distribution and sampling distributions. Students learn to compute probabilities using normal models and tables, understand the Central Limit Theorem, and apply statistical software to evaluate the normality of continuous random variables. The module is foundational for statistical inference, including estimation and hypothesis testing.
Learning Outcomes
Define continuous distributions and use them to calculate probabilities.
Explain the meaning and properties of a normal distribution.
Use sampling distributions to calculate probabilities for sample statistics.
Understand the Central Limit Theorem and its importance in statistics.
Evaluate normality using technology.
Relevance of the Normal Distribution
The normal distribution is one of the most common and important distributions in statistics. Mastery of its properties and applications is essential for understanding statistical inference, including estimation and hypothesis testing. This module provides the basis for much of biostatistics and further statistical coursework.
Section 1: The Normal Distribution
Definition of the Normal Distribution
A continuous random variable is a variable that can take on any value within a specified range. Examples include height and weight. A continuous random variable has a normal distribution if its shape is symmetric and bell-shaped.
Key Properties:
Symmetric about its mean.
The mean, median, and mode are all equal.
The area under the curve is 1 (or 100%), representing total probability.
The distribution is completely determined by its mean () and standard deviation ().
Example: Distribution of infant birth weights.
Histogram, Density Curve, and Area Under the Curve
Histograms display the frequency of data values. A density curve is a smooth curve that approximates the histogram and represents the probability distribution of a continuous variable.
The area under the curve (AUC) for a range of values corresponds to the probability that a random observation falls within that range.
For example, if the area under the curve for birth weights less than 3500g is 0.37, then 37% of infants have birth weights below 3500g.
Empirical Rule (68-95-99.7 Rule)
The Empirical Rule describes the spread of data in a normal distribution:
Approximately 68% of data fall within 1 standard deviation of the mean.
Approximately 95% fall within 2 standard deviations.
Approximately 99.7% fall within 3 standard deviations.
Identifying Unusual Values
Values more than 2 standard deviations from the mean are considered "unusual." This is based on the Empirical Rule, as only about 5% of values lie outside this range.
Standard Normal Distribution and Z-Scores
The standard normal distribution is a normal distribution with mean 0 and standard deviation 1. Any normal distribution can be transformed to the standard normal using z-scores:
Z-score formula:
The z-score indicates how many standard deviations an observation () is from the mean ().
Example: If SAT scores have and , and Ben scored 1350:
Ben's score is 1.1 standard deviations above the mean.
Z-Score Table (Normal Probability Table)
Z-score tables provide the area under the standard normal curve to the left of a given z-score. This area represents the probability that a randomly selected value is less than the specified z-score.
Percentiles of the Normal Distribution
The percentile of a normal distribution is the value below which a given percentage of observations fall. To find the value corresponding to a percentile:
Find the z-score that matches the desired percentile in the z-table.
Convert the z-score to the original scale using:
Section 2: Sampling Distributions
Concept & Definition
A sampling distribution is the probability distribution of a statistic (such as the sample mean or proportion) computed from all possible samples of a given size from the same population.
Sampling distributions are essential for making statistical inferences about populations.
Sampling Distribution of the Sample Mean
The mean of the sampling distribution of the sample mean () is equal to the population mean ().
The standard deviation of the sampling distribution (standard error) is:
As sample size () increases, the standard error decreases.
Central Limit Theorem (CLT)
The Central Limit Theorem states that, for a sufficiently large sample size, the sampling distribution of the sample mean will be approximately normal, regardless of the population's distribution.
This theorem is fundamental for statistical inference.
Sampling Distribution of the Sample Proportion
The mean of the sampling distribution of the sample proportion () is the population proportion ().
The standard error is:
Conditions: Both and should be greater than 5 for normal approximation.
Sampling Distribution of the Difference Between Two Sample Means
For two independent samples, the mean of the difference is .
The standard error is:
Sampling Distribution of the Difference Between Two Sample Proportions
For two independent samples, the mean of the difference is .
The standard error is:
Conditions: , , , and should all be greater than 5.
Section 3: Using Technology to Evaluate Normality
Evaluating Normality with Statistical Software
Statistical software (e.g., SAS) can be used to assess whether a variable is approximately normally distributed. Common methods include:
Constructing histograms with superimposed normal curves.
Creating Q-Q (quantile-quantile) plots: If data points fall along a straight line, the distribution is approximately normal.
Comparing mean, median, and mode for similarity.
Example: The variable MAGE (Mother's Age) in a dataset was found to be approximately normal with mean 26.92 and standard deviation 6.107.
Summary Table: Key Formulas and Properties
Statistic | Mean | Standard Error | Conditions |
|---|---|---|---|
Sample Mean () | Any sample size; CLT applies for large | ||
Sample Proportion () | , | ||
Difference of Means () | Independent samples | ||
Difference of Proportions () | , , , |
Conclusion
This module provides a comprehensive foundation for understanding normal and sampling distributions, which are essential for statistical inference. Mastery of these concepts enables students to estimate population parameters, test hypotheses, and apply statistical methods in real-world scenarios.