Random Variables, Sampling Distributions, Confidence Intervals, and Hypothesis Testing: Study Guide

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Random Variables and Probability Distributions

Definition and Types of Random Variables

A random variable is a numeric value associated with the outcome of a probability experiment. Random variables are classified as either discrete or continuous based on the nature of their possible values.

Discrete Random Variable: Takes on a countable set of values, often whole numbers. Typically associated with counting events (e.g., number of successes).
Continuous Random Variable: Can take on any value within an interval, typically associated with measurements (e.g., weight, time, volume).

Probability Distribution: Describes how probabilities are distributed over the values of the random variable.

For discrete random variables, probability distributions are often represented by tables or charts.
For continuous random variables, probability distributions are represented by smooth curves called Probability Density Functions (PDF), where the total area under the curve equals 1.

Common Notation: Random variables are often denoted by capital letters such as X, Y, Z.

Key Properties of Discrete Random Variables

Mean (Expected Value): The average value expected from the random variable.
Variance: Measures the spread of the random variable's values.

Key Discrete Random Variables

Binomial Random Variable:
- Fixed number of independent trials.
- Each trial has a constant probability of success (p).
- Two possible outcomes: success or failure.
- Counts the number of successes in n trials.
- Calculator commands: BINOMPDF (probability of exactly X successes), BINOMCDF (probability of X or fewer successes).
Poisson Random Variable:
- Counts the number of events in a fixed unit of time or space.
- Events occur independently and at a constant average rate (lambda).
- Calculator commands: POISSONPDF (probability of exactly X events), POISSONCDF (probability of X or fewer events).
Hypergeometric Random Variable:
- Used when sampling without replacement from a finite population with known composition.
- Example: Drawing marbles from a jar with known color distribution.

Continuous Random Variables

Result from measurements; can take any value within an interval.
Probability distributions are described by Probability Density Functions (PDFs).
The area under the PDF curve represents probability.

Normal Random Variable

Bell-shaped probability density function, centered at the population mean.
Standard deviation marks points of inflection.
Calculator commands: NormalCDF (probabilities), INVNORM (find values for given percentiles).

Standard Normal Random Variable

Designated as 'Z', with mean 0 and standard deviation 1.
Z-scores indicate the number of standard deviations from the mean.
Useful for comparing different normal distributions (e.g., SAT vs. ACT scores).

Sampling Distributions

Definition and Importance

Sampling distributions describe the probability distribution of a statistic (such as sample mean or sample proportion) computed from a random sample. They are essential for making inferences about population parameters.

Sample averages (\bar{x}) and sample proportions (\hat{p}) are continuous random variables.
Each has a probability density function, called a sampling distribution.

Key Properties

Mean of Sampling Distribution of Sample Mean:
Standard Deviation of Sampling Distribution of Sample Mean:
Mean of Sampling Distribution of Sample Proportion:
Standard Deviation of Sampling Distribution of Sample Proportion:

Central Limit Theorem (CLT)

If sample size n ≥ 30, the sampling distribution of the sample mean is approximately normal, regardless of the population's distribution.
If the population is normal, the sampling distribution of the sample mean is normal for any sample size.
Larger samples yield tighter sampling distributions (smaller standard error).

Unbiased Estimators and Minimum Variance

An estimator is unbiased if its expected value equals the population parameter.
The sample mean (\bar{x}) is an unbiased estimator for the population mean (\mu).
The sample proportion (\hat{p}) is an unbiased estimator for the population proportion (p).
An estimator has minimum variance if it has the smallest variance among all unbiased estimators.

Confidence Intervals for Mean (\mu) and Proportion (p)

Definition and Construction

A confidence interval (CI) is a range of values, derived from sample statistics, that is likely to contain the population parameter. The confidence level (e.g., 95%) indicates the probability that the interval contains the parameter in repeated sampling.

Margin of Error: The half-width of the confidence interval; reflects sampling error.

Large Sample Confidence Interval for Mean (\mu)

Relies on the Central Limit Theorem; uses sample standard deviation (s) as an estimate for population standard deviation (\sigma).
Calculator command: Zinterval.
Critical value (Z): Area in each tail = ; use INVNORM to find Z.
Formula:

Small Sample Confidence Interval for Mean (\mu)

Requires evidence that the population is approximately normal.
Uses t-distribution; calculator command: tinterval.
Degrees of freedom (DF): .
Critical value (t): Area in each tail = ; use INVT with DF.
Formula:

T Random Variable and Degrees of Freedom

The t-distribution is a family of curves resembling the standard normal curve.
Each curve is defined by its degrees of freedom (DF = n - 1).
As sample size increases, the t-distribution approaches the normal distribution.

Large Sample Confidence Interval for Proportion (p)

Requires at least 15 successes and 15 failures in the sample.
Calculator command: 1prop-ZINT.
Formula:

Small Sample Confidence Interval for Proportion (p)

Used when there are fewer than 15 successes or failures; no standard calculator command.

Sample Size Determination

To achieve a desired margin of error, use if no prior estimate is available.
Formula provided on formula sheet (not specified here).

Alpha and Confidence Level

Z and t scores are associated with an area of in each tail.

Hypothesis Testing for Mean (\mu) and Proportion (p)

Key Concepts and Steps

Null Hypothesis (H0): Represents the status quo; must contain an equal sign.
Alternative Hypothesis (Ha): Represents the claim to be tested; uses >, <, or ≠.
Type I Error: Rejecting a true null hypothesis.
Type II Error: Failing to reject a false null hypothesis.
Significance Level (Alpha): Probability of Type I error; controls the risk.
Test Statistic: Converts sample statistic to a Z or t value.
Critical Value: Z or t score marking the start of the rejection region.
Rejection Region: Area(s) in the tails of the distribution where H0 is rejected.
P-value: Probability of observing the sample result (or more extreme) if H0 is true; quantifies how unusual the result is.

Types of Tests

One-Tailed Test (Left): Ha uses <.
One-Tailed Test (Right): Ha uses >.
Two-Tailed Test: Ha uses ≠.

Decision Rules

If p-value < alpha: Reject H0 and accept Ha.
If p-value > alpha: Insufficient evidence to reject H0; H0 is plausible.
If test statistic falls in rejection region: Reject H0 and accept Ha.

Reporting Decisions

"There is sufficient evidence at an alpha level of xx to reject the null hypothesis and accept the alternative hypothesis."
"There is insufficient evidence at an alpha level of xx to reject the null hypothesis. Therefore, the null hypothesis is plausible."

Calculator Commands

ZTEST: Large sample hypothesis test for mean (\mu).
TTEST: Small sample hypothesis test for mean (\mu), if population is normal.
1propZtest: Large sample hypothesis test for proportion (p), requires at least 15 successes and 15 failures.

Summary Table: Hypothesis Test Types

Test Type	Sample Size	Distribution Used	Calculator Command
Mean (\mu), Large Sample	n ≥ 30	Normal	ZTEST
Mean (\mu), Small Sample	n < 30	t-distribution	TTEST
Proportion (p), Large Sample	At least 15 successes and 15 failures	Normal	1propZtest

Example: Hypothesis Test for Mean

Suppose a manufacturer claims the mean lifetime of a battery is 100 hours. A sample of 40 batteries yields a mean of 98 hours and a standard deviation of 5 hours. Test the claim at alpha = 0.05.
Null hypothesis:
Alternative hypothesis:
Test statistic:
Compare z to critical value or p-value to alpha to make decision.

Additional info: Academic context and formulas have been expanded for clarity and completeness.