Probability Distributions: Concepts, Properties, and Applications

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Probability Distributions

Introduction to Probability Distributions

Probability distributions are fundamental tools in statistics for describing the likelihood of various outcomes of a random experiment. This section introduces the concept of a random variable and the probability distribution, and explains how to visualize and analyze these distributions using histograms and key parameters such as mean, variance, and standard deviation.

Random Variable: A variable (typically denoted by x) that takes a single numerical value, determined by chance, for each outcome of a procedure.
Probability Distribution: A description that assigns a probability to each possible value of the random variable. It can be represented as a table, formula, or graph.
Probability Histogram: A graphical representation of a probability distribution, where the vertical axis shows probabilities instead of relative frequencies.

Basic Concepts of Probability Distribution

Types of Random Variables

Random variables are classified based on the nature of their possible values:

Discrete Random Variable: Has a finite or countable set of values. For example, the number of coin tosses before getting heads.
Continuous Random Variable: Has infinitely many values, which are not countable. These values are measured on a continuous scale, such as body temperature.

Requirements for a Probability Distribution

Every probability distribution must satisfy the following three requirements:

There is a numerical (not categorical) random variable x, and its values are associated with corresponding probabilities.
The sum of all probabilities must be 1: (Sums such as 0.999 or 1.001 are acceptable due to rounding errors.)
Each probability value must be between 0 and 1 inclusive:

Examples of Probability Distributions

Example: Births

Consider two births, where male and female births are equally likely. Let x be the number of females in two births. The probability distribution is:

x: Number of Females in Two Births	P(x)
0	0.25
1	0.50
2	0.25

Discrete random variable: x can be 0, 1, or 2 (finite number of values).
Probability distribution: Satisfies all three requirements above.

Probability Histogram

A probability histogram for the above example would have three bars corresponding to x = 0, 1, 2, with heights 0.25, 0.50, and 0.25, respectively. The vertical axis represents probability.

Probability Formula

Probability distributions can also be expressed as formulas. For the births example:

(where x = 0, 1, or 2)

Calculating for each value:

Non-Example: Software Piracy

Consider the following table:

Country	Proportion of Unlicensed Software
United States	0.17
China	0.70
India	0.58
Russia	0.64
Total	2.09

This is not a probability distribution because:
x is categorical (country), not numerical.
The sum of probabilities is not 1.

Parameters of a Probability Distribution

Population Parameters

For probability distributions, the mean, variance, and standard deviation are considered parameters (since they describe a population, not a sample).

Mean (μ):
Variance (σ²): (easier to understand) (easier for manual calculations)
Standard deviation (σ):

Expected Value

The expected value of a discrete random variable x, denoted by E, is the mean value of the outcomes:

Example: Calculating Mean, Variance, and Standard Deviation

Given the probability distribution for number of females in two births:

x: Number of Females in Two Births	P(x)
0	0.25
1	0.50
2	0.25

Mean:
Variance:
Standard deviation:

Interpretation: In two births, the mean number of females is 1.0, the variance is 0.5, and the standard deviation is 0.7. The expected value is also 1.0.

Identifying Significant Results

Range Rule of Thumb

The range rule of thumb helps identify significantly low or high values in a probability distribution:

Significantly low values: or lower
Significantly high values: or higher
Values not significant: Between and

Example: For two births, , Values of 2.4 and above are significantly high. Since 2 is not greater than or equal to 2.4, it is not significantly high.

Significant Results with Probabilities

Significantly high number of successes: x successes among n trials is significantly high if
Significantly low number of successes: x successes among n trials is significantly low if
The threshold 0.05 is conventional but not absolute; other values (e.g., 0.01) may be used.

The Rare Event Rule for Inferential Statistics

If, under a given assumption, the probability of a particular outcome is very small and the outcome occurs significantly less than or greater than expected, we conclude that the assumption is probably not correct.

Summary Table: Probability Distribution Requirements

Requirement	Description
Numerical Random Variable	Values must be numbers, not categories
Sum of Probabilities
Probability Range	Each must be between 0 and 1

Summary Table: Parameters of a Discrete Probability Distribution

Parameter	Formula (LaTeX)	Description
Mean (μ)		Expected value of the random variable
Variance (σ²)		Average squared deviation from the mean
Standard Deviation (σ)		Square root of the variance

Key Takeaways

Probability distributions describe the likelihood of outcomes for random variables.
Discrete and continuous random variables differ in the nature of their possible values.
Probability distributions must meet specific requirements to be valid.
Parameters such as mean, variance, and standard deviation summarize the distribution.
Significant results can be identified using the range rule of thumb or probability thresholds.