Statistics Unit 3: Confidence Intervals and Hypothesis Testing (Chapters 9-11) – Study Guide

Notes Practice Video lessons

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Vocabulary and Notation

Key Terms

Point Estimate: A single value used to estimate a population parameter (e.g., sample mean \( \bar{x} \) estimates population mean \( \mu \)).
Confidence Interval: An interval estimate, calculated from the sample data, that is likely to contain the population parameter with a specified level of confidence.
Level of Confidence : The probability that the confidence interval contains the true parameter.
Margin of Error: The maximum expected difference between the point estimate and the true parameter value.
Critical Value: The value that defines the endpoints of the confidence interval, based on the desired confidence level (e.g., z\( \alpha/2 \) or t\( \alpha/2 \)).
Student’s t-Distribution: A probability distribution used when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown.
Bootstrapping: A resampling method used to estimate the sampling distribution of a statistic by repeatedly sampling with replacement from the observed data.
Percentile Method Confidence Interval: A confidence interval constructed from the percentiles of the bootstrap distribution.
Hypothesis: A statement about a population parameter. Includes the null hypothesis (\( H_0 \)) and alternative hypothesis (\( H_1 \)).
Hypothesis Testing: A statistical method for testing a claim about a population parameter using sample data.
Type I Error: Rejecting the null hypothesis when it is true (false positive).
Type II Error: Failing to reject the null hypothesis when it is false (false negative).
Level of Significance (\( \alpha \)): The probability of making a Type I error.
P-value: The probability, under the null hypothesis, of obtaining a result equal to or more extreme than what was actually observed.
Statistical Significance: When the observed effect is unlikely to have occurred by chance, as determined by the p-value.
Practical Significance: When the observed effect is large enough to be meaningful in real-world terms.
Independent Samples: Samples in which the selection of one sample does not influence the selection of the other.
Dependent Samples (Matched-Pairs): Samples in which each observation in one sample can be paired with an observation in the other sample.
Robust Test: A statistical test that is valid even when certain assumptions are violated.
Randomization Test: A nonparametric method for hypothesis testing using random resampling.

One Sample Confidence Intervals

Confidence Interval for One Sample Proportion

Used to estimate the true population proportion based on a sample.

Formula:

Assumptions:
- Sample obtained by simple random sampling or randomized experiment.
- \( n\hat{p}(1-\hat{p}) \geq 10 \)
- Sampled values are independent (sample size < 5% of population).
Example: If \( \hat{p} = 0.6 \), \( n = 100 \), and 95% confidence (\( z_{0.025} = 1.96 \)), the interval is:

t Confidence Interval for Mean

Used when estimating the population mean and the population standard deviation is unknown.

Formula:

Assumptions:
- Sample obtained by simple random sampling or randomized experiment.
- No outliers; population is normal or sample size \( n \geq 30 \).
- Sampled values are independent.
Example: \( \bar{x} = 50 \), \( s = 10 \), \( n = 25 \), 95% confidence (\( t_{0.025,24} \approx 2.064 \)):

One Sample Hypothesis Tests

z Test for One Sample Proportion

Tests whether the population proportion equals a specified value.

Test Statistic:

Hypotheses:
- Two-tailed: \( H_0: p = p_0 \), \( H_1: p \neq p_0 \)
- Left-tailed: \( H_0: p = p_0 \), \( H_1: p < p_0 \)
- Right-tailed: \( H_0: p = p_0 \), \( H_1: p > p_0 \)
Assumptions:
- Simple random sample or randomized experiment.
- \( n p_0 (1-p_0) \geq 10 \)
- Sampled values are independent (sample size < 5% of population).

t Test for Mean

Tests whether the population mean equals a specified value.

Test Statistic:

Degrees of Freedom: \( df = n - 1 \)
Hypotheses:
- Two-tailed: \( H_0: \mu = \mu_0 \), \( H_1: \mu \neq \mu_0 \)
- Left-tailed: \( H_0: \mu = \mu_0 \), \( H_1: \mu < \mu_0 \)
- Right-tailed: \( H_0: \mu = \mu_0 \), \( H_1: \mu > \mu_0 \)
Assumptions:
- Simple random sample or randomized experiment.
- No outliers; population is normal or \( n \geq 30 \).
- Sampled values are independent.

Two Sample Hypothesis Tests

Two Sample z Test for Proportions

Compares the proportions of two independent groups.

Test Statistic:

where \( \hat{p} = \dfrac{x_1 + x_2}{n_1 + n_2} \)

Hypotheses:
- Two-tailed: \( H_0: p_1 = p_2 \), \( H_1: p_1 \neq p_2 \)
- Left-tailed: \( H_0: p_1 = p_2 \), \( H_1: p_1 < p_2 \)
- Right-tailed: \( H_0: p_1 = p_2 \), \( H_1: p_1 > p_2 \)
Assumptions:
- Independent samples from simple random sampling or randomized experiment.
- \( n\hat{p}(1-\hat{p}) \geq 10 \) for both samples.
- Sample sizes < 5% of respective populations.

Two Sample t Test for Dependent Means (Matched Pairs)

Compares means from paired or matched samples.

Test Statistic:

Degrees of Freedom: \( df = n - 1 \)
Hypotheses:
- Two-tailed: \( H_0: \mu_d = 0 \), \( H_1: \mu_d \neq 0 \)
- Left-tailed: \( H_0: \mu_d = 0 \), \( H_1: \mu_d < 0 \)
- Right-tailed: \( H_0: \mu_d = 0 \), \( H_1: \mu_d > 0 \)
Assumptions: Same as one sample mean, but applied to the differences.

Two Sample t Test for Independent Means (Unequal Variances)

Compares means from two independent samples, not assuming equal variances.

Test Statistic:

Degrees of Freedom (Welch-Satterthwaite approximation):

Hypotheses:
- Two-tailed: \( H_0: \mu_1 = \mu_2 \), \( H_1: \mu_1 \neq \mu_2 \)
- Left-tailed: \( H_0: \mu_1 = \mu_2 \), \( H_1: \mu_1 < \mu_2 \)
- Right-tailed: \( H_0: \mu_1 = \mu_2 \), \( H_1: \mu_1 > \mu_2 \)
Assumptions:
- Independent samples from simple random sampling or randomized experiment.
- Populations are normal or sample sizes \( n_1, n_2 \geq 30 \).
- Sample sizes < 5% of respective populations.

Two Sample Confidence Intervals

Confidence Interval for Difference Between Two Proportions

Formula:

Assumptions:
- Independent samples from simple random sampling or randomized experiment.
- \( n_1\hat{p}_1(1-\hat{p}_1) \geq 10 \) and \( n_2\hat{p}_2(1-\hat{p}_2) \geq 10 \)
- Sample sizes < 5% of respective populations.

Confidence Interval for Mean of Differences (Paired Data)

Formula:

Degrees of Freedom: \( df = n - 1 \)
Assumptions: Same as one sample mean, but applied to the differences.

Confidence Interval for Difference Between Two Independent Means (Unequal Variances)

Formula:

Degrees of Freedom: (see formula above)
Assumptions:
- Independent samples from simple random sampling or randomized experiment.
- Populations are normal or sample sizes \( n_1, n_2 \geq 30 \).
- Sample sizes < 5% of respective populations.

Sample Size Calculations

Sample Size Needed for Proportions

With Prior Estimate \( \hat{p} \):

Without Prior Estimate:

where E is the desired margin of error (as a decimal).

Sample Size Needed for Means

Formula:

where E is the desired margin of error.

Summary Table: Hypothesis Tests and Confidence Intervals

Test/Interval	Parameter	Assumptions
One-sample z for proportion	p	Random sample, independence, \( n p_0 (1-p_0) \geq 10 \)
One-sample t for mean	\( \mu \)	Random sample, normality or large n, independence
Two-sample z for proportions	\( p_1 - p_2 \)	Random, independent samples, \( n\hat{p}(1-\hat{p}) \geq 10 \)
Two-sample t for means (independent, unequal variances)	\( \mu_1 - \mu_2 \)	Random, independent samples, normality or large n
Paired t for means	\( \mu_d \)	Random sample of pairs, normality or large n

Additional Info

Statistical software such as StatCrunch can be used to perform these calculations and simulations.
Bootstrapping and randomization tests provide nonparametric alternatives when assumptions are questionable.