Study Notes: Random Variables, Sampling Distributions, Confidence Intervals, and Hypothesis Testing

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 4: Random Variables and Probability Distributions

Random Variables

A random variable is a numeric value associated with the outcome of a probability experiment. Random variables are classified as either discrete or continuous based on the type of values they can assume.

Discrete Random Variable: Takes on countable values, often whole numbers, typically associated with counting (e.g., number of heads in coin tosses).
Continuous Random Variable: Takes on any value within an interval, typically associated with measurement (e.g., weight, time, volume).

Common Notation: Random variables are often denoted by capital letters such as X, Y, or Z.

Key Properties of Discrete Random Variables

Mean (Expected Value): The mean of a discrete random variable X is given by:
Variance: The variance of a discrete random variable X is:
Probability Distribution: Represented by a table or chart listing all possible values and their probabilities.

Key Properties of Continuous Random Variables

Values are measured, not counted, and can take any value within an interval.
Probability distribution is represented by a probability density function (PDF), a smooth curve where the total area under the curve equals 1.
The probability that the variable falls within a certain interval is the area under the curve over that interval.

Key Discrete Random Variables

Binomial Random Variable:
- Fixed number of independent trials (n).
- Each trial has two possible outcomes: success or failure.
- Probability of success (p) is constant for each trial.
- Counts the number of successes in n trials.
- Probability mass function:
- Calculator commands: BINOMPDF (exactly X successes), BINOMCDF (X or fewer successes).
Poisson Random Variable:
- Counts the number of events (successes) in a fixed interval of time or space.
- Events occur independently and at a constant average rate (λ, lambda).
- Probability mass function:
- Calculator commands: POISSONPDF (exactly X successes), POISSONCDF (X or fewer successes).
Hypergeometric Random Variable:
- Used when sampling without replacement from a finite population with known composition.
- Example: Drawing marbles of different colors from a jar without replacement.

Normal Random Variable

The normal distribution is the most important continuous random variable, with a bell-shaped probability density function centered at the mean (μ) and standard deviation (σ).
Probabilities are computed using the NormalCDF command; percentiles are found using INVNORM.
The standard normal random variable (Z) has mean 0 and standard deviation 1. Z-scores indicate the number of standard deviations from the mean.
Useful for comparing values from different normal distributions (e.g., SAT vs. ACT scores).

Example

Suppose X is the number of heads in 10 coin tosses (binomial, n=10, p=0.5). The probability of exactly 6 heads is:

Chapter 5: Sampling Distributions

Sampling Distributions

A sampling distribution is the probability distribution of a statistic (such as the sample mean or sample proportion) computed from a random sample. It describes how the statistic varies from sample to sample.

Sample averages (\bar{x}) and sample proportions (\hat{p}) are continuous random variables.
Their probability distributions are called sampling distributions.

Key Properties

For the sample mean:
- Expected value:
- Standard deviation (standard error): (approximated by if population σ is unknown)
For the sample proportion:
- Expected value:
- Standard deviation:
Central Limit Theorem (CLT): For large sample sizes (n ≥ 30), the sampling distribution of the sample mean is approximately normal, regardless of the population's distribution.
If the population is normal, the sampling distribution of the sample mean is normal for any sample size.
Larger samples yield tighter (less variable) sampling distributions; standard error decreases as n increases.
Unbiased Estimator: An estimator whose expected value equals the population parameter (e.g., sample mean for population mean).
Minimum Variance: Among unbiased estimators, the one with the smallest variance is preferred.

Example

If the population mean is 100 and standard deviation is 15, for samples of size 36:

Chapter 6: Confidence Intervals for μ and p

Confidence Intervals

A confidence interval (CI) is a range of values, derived from sample statistics, that is likely to contain the population parameter with a specified level of confidence (e.g., 95%).

Confidence Level (CL): The probability that the CI contains the parameter in repeated samples (e.g., 0.95, 0.90, 0.99).
Margin of Error: The half-width of the confidence interval; reflects sampling variability.

Large Sample Confidence Interval for μ

Relies on the Central Limit Theorem; use when n is large (n ≥ 30).
Formula:
zα/2 is found using the invnorm calculator command, with α = 1 – CL.

Small Sample Confidence Interval for μ

Use when n is small and the population is approximately normal.
Formula:
tα/2, df is found using the invt calculator command, with degrees of freedom (df) = n – 1.

Confidence Interval for p (Proportion)

Large sample: At least 15 successes and 15 failures in the sample.
Formula:
Calculator command: 1prop-ZINT.
For small samples, special methods are required (no calculator command).

Sample Size Determination

To achieve a desired margin of error, solve for n in the margin of error formula.
If p is unknown, use p = 0.5 for maximum variability.

Alpha (α)

α = 1 – confidence level; determines the critical z or t values for the CI.
The area in each tail is α/2.

Example

For a sample mean of 50, s = 10, n = 100, and 95% confidence: CI: (48.04, 51.96)

Chapter 7: Hypothesis Testing for μ and p

Hypothesis Testing

A hypothesis test is a statistical procedure to test claims about population parameters using sample data.

Key Components

Null Hypothesis (H0): The status quo or default claim; always contains an equality (e.g., μ = μ0).
Alternative Hypothesis (Ha): The claim we seek evidence for; uses >, <, or ≠.
Type I Error (α): Rejecting H0 when it is true.
Type II Error (β): Failing to reject H0 when it is false.
Significance Level (α): Probability of a Type I error; chosen by the researcher (e.g., 0.05).
Test Statistic: The sample statistic converted to a z or t value.
Critical Value: The z or t value that marks the boundary of the rejection region.
Rejection Region: The set of values for which H0 is rejected.
P-value: The probability, under H0, of observing a result as extreme as the sample result.

Types of Tests

One-tailed Test (Left): Ha: parameter < value
One-tailed Test (Right): Ha: parameter > value
Two-tailed Test: Ha: parameter ≠ value

Decision Rules

If p-value < α, reject H0; sufficient evidence for Ha.
If p-value > α, fail to reject H0; insufficient evidence for Ha.
If test statistic falls in the rejection region, reject H0.

Reporting Decisions

"There is sufficient evidence at α = xx to reject the null hypothesis and accept the alternative hypothesis."
"There is insufficient evidence at α = xx to reject the null hypothesis. Therefore, the null hypothesis is plausible."

Test Types and Calculator Commands

Large Sample Test for μ: Use normal distribution (ZTEST).
Small Sample Test for μ: Use t distribution (TTEST), if population is normal.
Large Sample Test for p: Use normal distribution (1propZtest), requires at least 15 successes and 15 failures (n*p0 ≥ 15, n*(1–p0) ≥ 15).

Example

Suppose H0: μ = 100, Ha: μ > 100, sample mean = 105, s = 10, n = 25, α = 0.05. Test statistic: Compare t to critical value from t-table with df = 24.

Summary Table: Key Random Variables

Random Variable	Context	Parameters	Calculator Command
Binomial	Fixed number of trials, success/failure	n, p	BINOMPDF, BINOMCDF
Poisson	Events in interval (time/space)	λ	POISSONPDF, POISSONCDF
Hypergeometric	Sampling without replacement	N, K, n	--
Normal	Measurement, continuous	μ, σ	NormalCDF, INVNORM

Additional info: Table entries for mean and variance of the hypergeometric distribution are standard formulas inferred for completeness.