BackStudy Notes: Random Variables, Sampling Distributions, Confidence Intervals, and Hypothesis Testing
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 4: Random Variables and Probability Distributions
Random Variables
A random variable is a numeric value associated with the outcome of a probability experiment. Random variables are classified as either discrete or continuous based on the type of values they can assume.
Discrete Random Variable: Takes on countable values, often whole numbers, typically associated with counting (e.g., number of heads in coin tosses).
Continuous Random Variable: Takes on any value within an interval, typically associated with measurement (e.g., weight, time, volume).
Common Notation: Random variables are often denoted by capital letters such as X, Y, or Z.
Key Properties of Discrete Random Variables
Mean (Expected Value): The mean of a discrete random variable X is given by:
Variance: The variance of a discrete random variable X is:
Probability Distribution: Represented by a table or chart listing all possible values and their probabilities.
Key Properties of Continuous Random Variables
Values are measured, not counted, and can take any value within an interval.
Probability distribution is represented by a probability density function (PDF), a smooth curve where the total area under the curve equals 1.
The probability that the variable falls within a certain interval is the area under the curve over that interval.
Key Discrete Random Variables
Binomial Random Variable:
Fixed number of independent trials (n).
Each trial has two possible outcomes: success or failure.
Probability of success (p) is constant for each trial.
Counts the number of successes in n trials.
Probability mass function:
Calculator commands: BINOMPDF (exactly X successes), BINOMCDF (X or fewer successes).
Poisson Random Variable:
Counts the number of events (successes) in a fixed interval of time or space.
Events occur independently and at a constant average rate (λ, lambda).
Probability mass function:
Calculator commands: POISSONPDF (exactly X successes), POISSONCDF (X or fewer successes).
Hypergeometric Random Variable:
Used when sampling without replacement from a finite population with known composition.
Example: Drawing marbles of different colors from a jar without replacement.
Normal Random Variable
The normal distribution is the most important continuous random variable, with a bell-shaped probability density function centered at the mean (μ) and standard deviation (σ).
Probabilities are computed using the NormalCDF command; percentiles are found using INVNORM.
The standard normal random variable (Z) has mean 0 and standard deviation 1. Z-scores indicate the number of standard deviations from the mean.
Useful for comparing values from different normal distributions (e.g., SAT vs. ACT scores).
Example
Suppose X is the number of heads in 10 coin tosses (binomial, n=10, p=0.5). The probability of exactly 6 heads is:
Chapter 5: Sampling Distributions
Sampling Distributions
A sampling distribution is the probability distribution of a statistic (such as the sample mean or sample proportion) computed from a random sample. It describes how the statistic varies from sample to sample.
Sample averages (\bar{x}) and sample proportions (\hat{p}) are continuous random variables.
Their probability distributions are called sampling distributions.
Key Properties
For the sample mean:
Expected value:
Standard deviation (standard error): (approximated by if population σ is unknown)
For the sample proportion:
Expected value:
Standard deviation:
Central Limit Theorem (CLT): For large sample sizes (n ≥ 30), the sampling distribution of the sample mean is approximately normal, regardless of the population's distribution.
If the population is normal, the sampling distribution of the sample mean is normal for any sample size.
Larger samples yield tighter (less variable) sampling distributions; standard error decreases as n increases.
Unbiased Estimator: An estimator whose expected value equals the population parameter (e.g., sample mean for population mean).
Minimum Variance: Among unbiased estimators, the one with the smallest variance is preferred.
Example
If the population mean is 100 and standard deviation is 15, for samples of size 36:
Chapter 6: Confidence Intervals for μ and p
Confidence Intervals
A confidence interval (CI) is a range of values, derived from sample statistics, that is likely to contain the population parameter with a specified level of confidence (e.g., 95%).
Confidence Level (CL): The probability that the CI contains the parameter in repeated samples (e.g., 0.95, 0.90, 0.99).
Margin of Error: The half-width of the confidence interval; reflects sampling variability.
Large Sample Confidence Interval for μ
Relies on the Central Limit Theorem; use when n is large (n ≥ 30).
Formula:
zα/2 is found using the invnorm calculator command, with α = 1 – CL.
Small Sample Confidence Interval for μ
Use when n is small and the population is approximately normal.
Formula:
tα/2, df is found using the invt calculator command, with degrees of freedom (df) = n – 1.
Confidence Interval for p (Proportion)
Large sample: At least 15 successes and 15 failures in the sample.
Formula:
Calculator command: 1prop-ZINT.
For small samples, special methods are required (no calculator command).
Sample Size Determination
To achieve a desired margin of error, solve for n in the margin of error formula.
If p is unknown, use p = 0.5 for maximum variability.
Alpha (α)
α = 1 – confidence level; determines the critical z or t values for the CI.
The area in each tail is α/2.
Example
For a sample mean of 50, s = 10, n = 100, and 95% confidence: CI: (48.04, 51.96)
Chapter 7: Hypothesis Testing for μ and p
Hypothesis Testing
A hypothesis test is a statistical procedure to test claims about population parameters using sample data.
Key Components
Null Hypothesis (H0): The status quo or default claim; always contains an equality (e.g., μ = μ0).
Alternative Hypothesis (Ha): The claim we seek evidence for; uses >, <, or ≠.
Type I Error (α): Rejecting H0 when it is true.
Type II Error (β): Failing to reject H0 when it is false.
Significance Level (α): Probability of a Type I error; chosen by the researcher (e.g., 0.05).
Test Statistic: The sample statistic converted to a z or t value.
Critical Value: The z or t value that marks the boundary of the rejection region.
Rejection Region: The set of values for which H0 is rejected.
P-value: The probability, under H0, of observing a result as extreme as the sample result.
Types of Tests
One-tailed Test (Left): Ha: parameter < value
One-tailed Test (Right): Ha: parameter > value
Two-tailed Test: Ha: parameter ≠ value
Decision Rules
If p-value < α, reject H0; sufficient evidence for Ha.
If p-value > α, fail to reject H0; insufficient evidence for Ha.
If test statistic falls in the rejection region, reject H0.
Reporting Decisions
"There is sufficient evidence at α = xx to reject the null hypothesis and accept the alternative hypothesis."
"There is insufficient evidence at α = xx to reject the null hypothesis. Therefore, the null hypothesis is plausible."
Test Types and Calculator Commands
Large Sample Test for μ: Use normal distribution (ZTEST).
Small Sample Test for μ: Use t distribution (TTEST), if population is normal.
Large Sample Test for p: Use normal distribution (1propZtest), requires at least 15 successes and 15 failures (n*p0 ≥ 15, n*(1–p0) ≥ 15).
Example
Suppose H0: μ = 100, Ha: μ > 100, sample mean = 105, s = 10, n = 25, α = 0.05. Test statistic: Compare t to critical value from t-table with df = 24.
Summary Table: Key Random Variables
Random Variable | Context | Parameters | Mean | Variance | Calculator Command |
|---|---|---|---|---|---|
Binomial | Fixed number of trials, success/failure | n, p | BINOMPDF, BINOMCDF | ||
Poisson | Events in interval (time/space) | λ | POISSONPDF, POISSONCDF | ||
Hypergeometric | Sampling without replacement | N, K, n | -- | ||
Normal | Measurement, continuous | μ, σ | NormalCDF, INVNORM |
Additional info: Table entries for mean and variance of the hypergeometric distribution are standard formulas inferred for completeness.