BackProbability Distributions: Discrete and Continuous Random Variables
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Probability Distributions
Introduction to Randomness and Random Variables
Probability distributions are fundamental concepts in statistics, describing how probabilities are assigned to possible outcomes of random phenomena. This section introduces the key ideas of randomness, random variables, and their probability distributions.
Randomness: In statistics, randomness arises from processes such as random sampling or randomized experiments. The outcome cannot be predicted with certainty in advance.
Random Variable: A random variable is a numerical measurement of the outcome of a random phenomenon. It is typically denoted by a capital letter (e.g., X), while a particular value it can take is denoted by a lowercase letter (e.g., x).
Example: If you flip a coin three times, the random variable X could represent the number of heads observed in the three flips. Possible values for X are 0, 1, 2, or 3.
Probability Distributions of Random Variables
Definition and Importance
A probability distribution of a random variable specifies all possible values the variable can take and the probability associated with each value. This allows statisticians to predict the likelihood of different outcomes in the long run.
Without randomness, it would not be possible to assign probabilities to outcomes.
Probability distributions are essential for making inferences about populations and for understanding the behavior of random processes.
Discrete Random Variables
A discrete random variable is one that takes a countable set of separate values (such as 0, 1, 2, ...). Its probability distribution lists each possible value and the probability of that value occurring.
For each value x, the probability P(x) must satisfy .
The sum of the probabilities for all possible values of x must equal 1: .
Example: Best of Seven Series
Consider a sports series where the winner is the first to win four games (best of seven). The random variable X is the number of games needed to determine a winner.
Number of Games x | Probability P(x) |
|---|---|
4 | 1/8 = 0.125 |
5 | 1/4 = 0.25 |
6 | 5/16 = 0.3125 |
7 | 5/16 = 0.3125 |
Question: What is the probability that the series lasts at least six games?
Solution:
Mean and Variability of Discrete Probability Distributions
To summarize a probability distribution, we use the mean (expected value) to describe its center and the standard deviation to describe its variability.
Parameter: A numerical summary of a probability distribution is called a parameter. Parameters are typically denoted by Greek letters.
The mean of a probability distribution is denoted by (mu).
The standard deviation is denoted by (sigma).
Mean (Expected Value) of a Discrete Random Variable
The mean (expected value) of a discrete random variable X is calculated as:
where the sum is taken over all possible values of x. The mean is a weighted average, where values that are more likely receive greater weight.
The expected value reflects the long-run average outcome, not necessarily a value that can be observed in a single trial.
Example: Responding to Risk
Suppose you are given $1000 to invest and must choose between two strategies:
A sure gain of $500.
A 0.50 chance to gain $1000 and a 0.50 chance to gain nothing.
Expected gain for strategy 1:
Expected gain for strategy 2:
Both strategies have the same expected gain, but the variability (risk) is different.
Standard Deviation of a Probability Distribution
The standard deviation measures the variability of the distribution. Larger values of indicate greater variability. Roughly, $\sigma$ describes how far values of the random variable fall, on average, from the expected value.
Probability Distributions of Categorical Variables
While most random variables are quantitative, categorical variables with two categories can also be represented numerically (e.g., 0 and 1). For such binary random variables, the mean is the probability of the outcome coded as 1.
Example: Success/failure, yes/no, or presence/absence outcomes.
Continuous Random Variables
Definition and Examples
A continuous random variable can take any value within an interval. Examples include time, age, height, and weight. In practice, continuous variables are often measured discretely due to rounding.
Probability Distribution of a Continuous Random Variable
The probability distribution of a continuous random variable is specified by a curve (probability density function) that determines the probability the variable falls within a particular interval.
The probability for any specific value is zero; only intervals have nonzero probability.
The probability that the variable falls within an interval is given by the area under the curve above that interval.
The total area under the curve (representing all possible values) is 1.
Example: Commuting Time
Suppose the area under the curve for commuting times greater than 45 minutes is 0.15. The probability that commuting time is less than 15 minutes is given by the area under the curve for values less than 15, which equals 0.29.
Additional info: For continuous random variables, probabilities are always associated with intervals, not individual points. The probability density function (pdf) is used to describe the distribution, and the area under the pdf over an interval gives the probability for that interval.