Random Variables, Probability Models, Sampling Distributions, and Hypothesis Testing

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Random Variables and Probability Models

Definition and Types of Random Variables

A random variable is a numerical value determined by the outcome of a random event. Random variables are fundamental in statistics for modeling uncertainty and variability.

Discrete Random Variable: Can take on a countable number of distinct values. Example: Number of heads in 10 coin tosses.
Continuous Random Variable: Can take any value within a given range. Example: Height of students in a class.

Notation: Random variables are typically denoted by capital letters such as X, Y, or Z.

Probability Model

A probability model describes all possible values of a random variable and their associated probabilities.

For discrete random variables, the model lists each possible value and its probability.
For continuous random variables, the model specifies a probability density function.

Expected Value and Variance

The expected value (mean) and variance are key properties of random variables.

Expected Value (Mean):

Variance:

Bernoulli Trials

A Bernoulli trial is an experiment with only two possible outcomes: success or failure.

Probability of success: p
Probability of failure: q = 1 - p
Each trial is independent.
Examples: Tossing a coin, yes/no survey responses, basketball free throws.

Uniform Model

If a random variable X can take values 1, 2, ..., n, and each outcome is equally likely, X has a discrete uniform distribution U[1,...,n].

Standardization (Z-score)

The Z-score measures how many standard deviations a value is from the mean.

Example: To find , calculate the Z-score and use the normal distribution table.

Sum of Random Variables

When adding independent random variables, their expected values and variances add:

Expected Value:

Example: If and , then .

Variance:

Example: If for both, then .

Distribution of Sample Proportions

Assumptions and Conditions

To use the sampling distribution of sample proportions, certain conditions must be met:

Independence Assumption: Sampled values must be independent.
Sample Size Assumption: Sample size n must be large enough.
Randomization Condition: Data should come from a randomized experiment or a simple random sample.
10% Condition: If sampling without replacement, n should be no more than 10% of the population.
Success/Failure Condition: Both and should be at least 10.

The Central Limit Theorem (CLT)

Statement and Implications

The Central Limit Theorem states that the sampling distribution of the mean approaches a normal distribution as the sample size increases, regardless of the population's shape.

For highly skewed distributions, larger sample sizes (dozens or hundreds) may be needed for normality.

Application Example

If only 20 recent graduates' salaries are sampled, concerns include small sample size, unknown population size, and unknown standard deviation.
Confidence intervals should be based on t-models when the population standard deviation is unknown.
Increasing sample size (e.g., to 60) makes the confidence interval more precise.

Hypothesis Testing

Null and Alternative Hypotheses

Hypothesis testing begins with a null hypothesis (), which assumes no effect or no change. The alternative hypothesis () represents all other possible values.

Null Hypothesis:
Alternative Hypothesis:
Example: ,

Standard Error and Z-score for Proportions

Standard Error of a Proportion:

Z-score for Proportion:

P-value

The p-value is the probability, under the null hypothesis, of obtaining a result at least as extreme as the observed result. It is not the probability that the null hypothesis is true.

P-value is calculated using the sampling distribution (often normal or t-distribution).
Excel function: T.DIST.RT can be used for right-tailed t-distribution p-values.

Summary Table: Key Concepts

Concept	Definition	Formula	Example/Application
Random Variable	Numerical outcome of a random event	N/A	Policy payout, coin toss
Expected Value	Mean of random variable		Average payout
Variance	Spread of random variable		Risk assessment
Bernoulli Trial	Experiment with two outcomes	N/A	Coin toss, yes/no survey
Uniform Distribution	All outcomes equally likely		Dice roll
Z-score	Standardized value		Normal distribution analysis
Central Limit Theorem	Sampling distribution approaches normality	N/A	Mean salary estimation
Hypothesis Testing	Test claim about population		Proportion test
P-value	Probability of observed result under	N/A	Statistical significance