BackRandom Variables, Probability Models, Sampling Distributions, and Hypothesis Testing
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Random Variables and Probability Models
Definition and Types of Random Variables
A random variable is a numerical value determined by the outcome of a random event. Random variables are fundamental in statistics for modeling uncertainty and variability.
Discrete Random Variable: Can take on a countable number of distinct values. Example: Number of heads in 10 coin tosses.
Continuous Random Variable: Can take any value within a given range. Example: Height of students in a class.
Notation: Random variables are typically denoted by capital letters such as X, Y, or Z.
Probability Model
A probability model describes all possible values of a random variable and their associated probabilities.
For discrete random variables, the model lists each possible value and its probability.
For continuous random variables, the model specifies a probability density function.
Expected Value and Variance
The expected value (mean) and variance are key properties of random variables.
Expected Value (Mean):
Variance:
Bernoulli Trials
A Bernoulli trial is an experiment with only two possible outcomes: success or failure.
Probability of success: p
Probability of failure: q = 1 - p
Each trial is independent.
Examples: Tossing a coin, yes/no survey responses, basketball free throws.
Uniform Model
If a random variable X can take values 1, 2, ..., n, and each outcome is equally likely, X has a discrete uniform distribution U[1,...,n].
Standardization (Z-score)
The Z-score measures how many standard deviations a value is from the mean.
Example: To find , calculate the Z-score and use the normal distribution table.
Sum of Random Variables
When adding independent random variables, their expected values and variances add:
Expected Value:
Example: If and , then .
Variance:
Example: If for both, then .
Distribution of Sample Proportions
Assumptions and Conditions
To use the sampling distribution of sample proportions, certain conditions must be met:
Independence Assumption: Sampled values must be independent.
Sample Size Assumption: Sample size n must be large enough.
Randomization Condition: Data should come from a randomized experiment or a simple random sample.
10% Condition: If sampling without replacement, n should be no more than 10% of the population.
Success/Failure Condition: Both and should be at least 10.
The Central Limit Theorem (CLT)
Statement and Implications
The Central Limit Theorem states that the sampling distribution of the mean approaches a normal distribution as the sample size increases, regardless of the population's shape.
For highly skewed distributions, larger sample sizes (dozens or hundreds) may be needed for normality.
Application Example
If only 20 recent graduates' salaries are sampled, concerns include small sample size, unknown population size, and unknown standard deviation.
Confidence intervals should be based on t-models when the population standard deviation is unknown.
Increasing sample size (e.g., to 60) makes the confidence interval more precise.
Hypothesis Testing
Null and Alternative Hypotheses
Hypothesis testing begins with a null hypothesis (), which assumes no effect or no change. The alternative hypothesis () represents all other possible values.
Null Hypothesis:
Alternative Hypothesis:
Example: ,
Standard Error and Z-score for Proportions
Standard Error of a Proportion:
Z-score for Proportion:
P-value
The p-value is the probability, under the null hypothesis, of obtaining a result at least as extreme as the observed result. It is not the probability that the null hypothesis is true.
P-value is calculated using the sampling distribution (often normal or t-distribution).
Excel function: T.DIST.RT can be used for right-tailed t-distribution p-values.
Summary Table: Key Concepts
Concept | Definition | Formula | Example/Application |
|---|---|---|---|
Random Variable | Numerical outcome of a random event | N/A | Policy payout, coin toss |
Expected Value | Mean of random variable | Average payout | |
Variance | Spread of random variable | Risk assessment | |
Bernoulli Trial | Experiment with two outcomes | N/A | Coin toss, yes/no survey |
Uniform Distribution | All outcomes equally likely | Dice roll | |
Z-score | Standardized value | Normal distribution analysis | |
Central Limit Theorem | Sampling distribution approaches normality | N/A | Mean salary estimation |
Hypothesis Testing | Test claim about population | Proportion test | |
P-value | Probability of observed result under | N/A | Statistical significance |