BackProbability Distributions: Concepts, Properties, and Applications
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Probability Distributions
Introduction to Probability Distributions
Probability distributions are fundamental tools in statistics for describing the likelihood of various outcomes of a random experiment. This section introduces the concept of a random variable and the probability distribution, and explains how to visualize and analyze these distributions using histograms and key parameters such as mean, variance, and standard deviation.
Random Variable: A variable (typically denoted by x) that takes a single numerical value, determined by chance, for each outcome of a procedure.
Probability Distribution: A description that assigns a probability to each possible value of the random variable. It can be represented as a table, formula, or graph.
Probability Histogram: A graphical representation of a probability distribution, where the vertical axis shows probabilities instead of relative frequencies.
Basic Concepts of Probability Distribution
Types of Random Variables
Random variables are classified based on the nature of their possible values:
Discrete Random Variable: Has a finite or countable set of values. For example, the number of coin tosses before getting heads.
Continuous Random Variable: Has infinitely many values, which are not countable. These values are measured on a continuous scale, such as body temperature.
Requirements for a Probability Distribution
Every probability distribution must satisfy the following three requirements:
There is a numerical (not categorical) random variable x, and its values are associated with corresponding probabilities.
The sum of all probabilities must be 1: (Sums such as 0.999 or 1.001 are acceptable due to rounding errors.)
Each probability value must be between 0 and 1 inclusive:
Examples of Probability Distributions
Example: Births
Consider two births, where male and female births are equally likely. Let x be the number of females in two births. The probability distribution is:
x: Number of Females in Two Births | P(x) |
|---|---|
0 | 0.25 |
1 | 0.50 |
2 | 0.25 |
Discrete random variable: x can be 0, 1, or 2 (finite number of values).
Probability distribution: Satisfies all three requirements above.
Probability Histogram
A probability histogram for the above example would have three bars corresponding to x = 0, 1, 2, with heights 0.25, 0.50, and 0.25, respectively. The vertical axis represents probability.
Probability Formula
Probability distributions can also be expressed as formulas. For the births example:
(where x = 0, 1, or 2)
Calculating for each value:
Non-Example: Software Piracy
Consider the following table:
Country | Proportion of Unlicensed Software |
|---|---|
United States | 0.17 |
China | 0.70 |
India | 0.58 |
Russia | 0.64 |
Total | 2.09 |
This is not a probability distribution because:
x is categorical (country), not numerical.
The sum of probabilities is not 1.
Parameters of a Probability Distribution
Population Parameters
For probability distributions, the mean, variance, and standard deviation are considered parameters (since they describe a population, not a sample).
Mean (μ):
Variance (σ²): (easier to understand) (easier for manual calculations)
Standard deviation (σ):
Expected Value
The expected value of a discrete random variable x, denoted by E, is the mean value of the outcomes:
Example: Calculating Mean, Variance, and Standard Deviation
Given the probability distribution for number of females in two births:
x: Number of Females in Two Births | P(x) |
|---|---|
0 | 0.25 |
1 | 0.50 |
2 | 0.25 |
Mean:
Variance:
Standard deviation:
Interpretation: In two births, the mean number of females is 1.0, the variance is 0.5, and the standard deviation is 0.7. The expected value is also 1.0.
Identifying Significant Results
Range Rule of Thumb
The range rule of thumb helps identify significantly low or high values in a probability distribution:
Significantly low values: or lower
Significantly high values: or higher
Values not significant: Between and
Example: For two births, , Values of 2.4 and above are significantly high. Since 2 is not greater than or equal to 2.4, it is not significantly high.
Significant Results with Probabilities
Significantly high number of successes: x successes among n trials is significantly high if
Significantly low number of successes: x successes among n trials is significantly low if
The threshold 0.05 is conventional but not absolute; other values (e.g., 0.01) may be used.
The Rare Event Rule for Inferential Statistics
If, under a given assumption, the probability of a particular outcome is very small and the outcome occurs significantly less than or greater than expected, we conclude that the assumption is probably not correct.
Summary Table: Probability Distribution Requirements
Requirement | Description |
|---|---|
Numerical Random Variable | Values must be numbers, not categories |
Sum of Probabilities | |
Probability Range | Each must be between 0 and 1 |
Summary Table: Parameters of a Discrete Probability Distribution
Parameter | Formula (LaTeX) | Description |
|---|---|---|
Mean (μ) | Expected value of the random variable | |
Variance (σ²) | Average squared deviation from the mean | |
Standard Deviation (σ) | Square root of the variance |
Key Takeaways
Probability distributions describe the likelihood of outcomes for random variables.
Discrete and continuous random variables differ in the nature of their possible values.
Probability distributions must meet specific requirements to be valid.
Parameters such as mean, variance, and standard deviation summarize the distribution.
Significant results can be identified using the range rule of thumb or probability thresholds.