BackStatistics Unit 3: Confidence Intervals and Hypothesis Testing (Chapters 9-11) – Study Guide
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Vocabulary and Notation
Key Terms
Point Estimate: A single value used to estimate a population parameter (e.g., sample mean \( \bar{x} \) estimates population mean \( \mu \)).
Confidence Interval: An interval estimate, calculated from the sample data, that is likely to contain the population parameter with a specified level of confidence.
Level of Confidence : The probability that the confidence interval contains the true parameter.
Margin of Error: The maximum expected difference between the point estimate and the true parameter value.
Critical Value: The value that defines the endpoints of the confidence interval, based on the desired confidence level (e.g., z\( \alpha/2 \) or t\( \alpha/2 \)).
Student’s t-Distribution: A probability distribution used when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown.
Bootstrapping: A resampling method used to estimate the sampling distribution of a statistic by repeatedly sampling with replacement from the observed data.
Percentile Method Confidence Interval: A confidence interval constructed from the percentiles of the bootstrap distribution.
Hypothesis: A statement about a population parameter. Includes the null hypothesis (\( H_0 \)) and alternative hypothesis (\( H_1 \)).
Hypothesis Testing: A statistical method for testing a claim about a population parameter using sample data.
Type I Error: Rejecting the null hypothesis when it is true (false positive).
Type II Error: Failing to reject the null hypothesis when it is false (false negative).
Level of Significance (\( \alpha \)): The probability of making a Type I error.
P-value: The probability, under the null hypothesis, of obtaining a result equal to or more extreme than what was actually observed.
Statistical Significance: When the observed effect is unlikely to have occurred by chance, as determined by the p-value.
Practical Significance: When the observed effect is large enough to be meaningful in real-world terms.
Independent Samples: Samples in which the selection of one sample does not influence the selection of the other.
Dependent Samples (Matched-Pairs): Samples in which each observation in one sample can be paired with an observation in the other sample.
Robust Test: A statistical test that is valid even when certain assumptions are violated.
Randomization Test: A nonparametric method for hypothesis testing using random resampling.
One Sample Confidence Intervals
Confidence Interval for One Sample Proportion
Used to estimate the true population proportion based on a sample.
Formula:
Assumptions:
Sample obtained by simple random sampling or randomized experiment.
\( n\hat{p}(1-\hat{p}) \geq 10 \)
Sampled values are independent (sample size < 5% of population).
Example: If \( \hat{p} = 0.6 \), \( n = 100 \), and 95% confidence (\( z_{0.025} = 1.96 \)), the interval is:
t Confidence Interval for Mean
Used when estimating the population mean and the population standard deviation is unknown.
Formula:
Assumptions:
Sample obtained by simple random sampling or randomized experiment.
No outliers; population is normal or sample size \( n \geq 30 \).
Sampled values are independent.
Example: \( \bar{x} = 50 \), \( s = 10 \), \( n = 25 \), 95% confidence (\( t_{0.025,24} \approx 2.064 \)):
One Sample Hypothesis Tests
z Test for One Sample Proportion
Tests whether the population proportion equals a specified value.
Test Statistic:
Hypotheses:
Two-tailed: \( H_0: p = p_0 \), \( H_1: p \neq p_0 \)
Left-tailed: \( H_0: p = p_0 \), \( H_1: p < p_0 \)
Right-tailed: \( H_0: p = p_0 \), \( H_1: p > p_0 \)
Assumptions:
Simple random sample or randomized experiment.
\( n p_0 (1-p_0) \geq 10 \)
Sampled values are independent (sample size < 5% of population).
t Test for Mean
Tests whether the population mean equals a specified value.
Test Statistic:
Degrees of Freedom: \( df = n - 1 \)
Hypotheses:
Two-tailed: \( H_0: \mu = \mu_0 \), \( H_1: \mu \neq \mu_0 \)
Left-tailed: \( H_0: \mu = \mu_0 \), \( H_1: \mu < \mu_0 \)
Right-tailed: \( H_0: \mu = \mu_0 \), \( H_1: \mu > \mu_0 \)
Assumptions:
Simple random sample or randomized experiment.
No outliers; population is normal or \( n \geq 30 \).
Sampled values are independent.
Two Sample Hypothesis Tests
Two Sample z Test for Proportions
Compares the proportions of two independent groups.
Test Statistic:
where \( \hat{p} = \dfrac{x_1 + x_2}{n_1 + n_2} \)
Hypotheses:
Two-tailed: \( H_0: p_1 = p_2 \), \( H_1: p_1 \neq p_2 \)
Left-tailed: \( H_0: p_1 = p_2 \), \( H_1: p_1 < p_2 \)
Right-tailed: \( H_0: p_1 = p_2 \), \( H_1: p_1 > p_2 \)
Assumptions:
Independent samples from simple random sampling or randomized experiment.
\( n\hat{p}(1-\hat{p}) \geq 10 \) for both samples.
Sample sizes < 5% of respective populations.
Two Sample t Test for Dependent Means (Matched Pairs)
Compares means from paired or matched samples.
Test Statistic:
Degrees of Freedom: \( df = n - 1 \)
Hypotheses:
Two-tailed: \( H_0: \mu_d = 0 \), \( H_1: \mu_d \neq 0 \)
Left-tailed: \( H_0: \mu_d = 0 \), \( H_1: \mu_d < 0 \)
Right-tailed: \( H_0: \mu_d = 0 \), \( H_1: \mu_d > 0 \)
Assumptions: Same as one sample mean, but applied to the differences.
Two Sample t Test for Independent Means (Unequal Variances)
Compares means from two independent samples, not assuming equal variances.
Test Statistic:
Degrees of Freedom (Welch-Satterthwaite approximation):
Hypotheses:
Two-tailed: \( H_0: \mu_1 = \mu_2 \), \( H_1: \mu_1 \neq \mu_2 \)
Left-tailed: \( H_0: \mu_1 = \mu_2 \), \( H_1: \mu_1 < \mu_2 \)
Right-tailed: \( H_0: \mu_1 = \mu_2 \), \( H_1: \mu_1 > \mu_2 \)
Assumptions:
Independent samples from simple random sampling or randomized experiment.
Populations are normal or sample sizes \( n_1, n_2 \geq 30 \).
Sample sizes < 5% of respective populations.
Two Sample Confidence Intervals
Confidence Interval for Difference Between Two Proportions
Formula:
Assumptions:
Independent samples from simple random sampling or randomized experiment.
\( n_1\hat{p}_1(1-\hat{p}_1) \geq 10 \) and \( n_2\hat{p}_2(1-\hat{p}_2) \geq 10 \)
Sample sizes < 5% of respective populations.
Confidence Interval for Mean of Differences (Paired Data)
Formula:
Degrees of Freedom: \( df = n - 1 \)
Assumptions: Same as one sample mean, but applied to the differences.
Confidence Interval for Difference Between Two Independent Means (Unequal Variances)
Formula:
Degrees of Freedom: (see formula above)
Assumptions:
Independent samples from simple random sampling or randomized experiment.
Populations are normal or sample sizes \( n_1, n_2 \geq 30 \).
Sample sizes < 5% of respective populations.
Sample Size Calculations
Sample Size Needed for Proportions
With Prior Estimate \( \hat{p} \):
Without Prior Estimate:
where E is the desired margin of error (as a decimal).
Sample Size Needed for Means
Formula:
where E is the desired margin of error.
Summary Table: Hypothesis Tests and Confidence Intervals
Test/Interval | Parameter | Formula | Assumptions |
|---|---|---|---|
One-sample z for proportion | p | Random sample, independence, \( n p_0 (1-p_0) \geq 10 \) | |
One-sample t for mean | \( \mu \) | Random sample, normality or large n, independence | |
Two-sample z for proportions | \( p_1 - p_2 \) | Random, independent samples, \( n\hat{p}(1-\hat{p}) \geq 10 \) | |
Two-sample t for means (independent, unequal variances) | \( \mu_1 - \mu_2 \) | Random, independent samples, normality or large n | |
Paired t for means | \( \mu_d \) | Random sample of pairs, normality or large n |
Additional Info
Statistical software such as StatCrunch can be used to perform these calculations and simulations.
Bootstrapping and randomization tests provide nonparametric alternatives when assumptions are questionable.