Statistics Unit 3: Estimation and Hypothesis Testing (Chapters 9-11) Study Guide

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Estimation and Hypothesis Testing

Vocabulary and Notation

This section introduces key terms and notation used in statistical inference, including estimation and hypothesis testing.

Point Estimate: A single value used to estimate a population parameter (e.g., sample mean \( \bar{x} \) estimates population mean \( \mu \)).
Confidence Interval: A range of values, derived from sample statistics, that is likely to contain the population parameter.
Level of Confidence: The probability that the confidence interval contains the true parameter, usually expressed as \( (1-\alpha) \cdot 100\% \).
Margin of Error: The maximum expected difference between the point estimate and the true parameter.
Critical Value: The value that marks the boundary for the desired confidence level (e.g., \( z_{\alpha/2} \) or \( t_{\alpha/2} \)).
Student’s t-Distribution: A probability distribution used when estimating population mean with unknown variance and small sample size.
Bootstrapping: A resampling method for estimating the distribution of a statistic.
Percentile Method Confidence Interval: Uses percentiles from bootstrap samples to construct confidence intervals.
Hypothesis: A claim about a population parameter. Includes Null Hypothesis (H0) and Alternative Hypothesis (H1).
Hypothesis Testing: Procedure to assess evidence against H0 in favor of H1.
Type I Error: Rejecting H0 when it is true.
Type II Error: Failing to reject H0 when H1 is true.
P-value: Probability of observing data as extreme as the sample, assuming H0 is true.
Statistical Significance: When the P-value is less than the significance level (\( \alpha \)), indicating evidence against H0.
Practical Significance: Whether the result has real-world importance.
Independent Samples: Samples with no relationship between observations.
Dependent Samples (Matched Pairs): Samples where observations are paired or related.
Robust Test: A test that remains valid under violations of assumptions.
Randomization Test: Uses random resampling to assess significance.

Parameter	Symbol
Population Proportion	p
Sample Proportion	\( \hat{p} \)
Population Mean	\( \mu \)
Sample Mean	\( \bar{x} \)
Sample Standard Deviation	s

One Sample Confidence Intervals

Confidence Interval for One Sample Proportion

Used to estimate the population proportion based on a sample.

Formula:
- Lower bound:
- Upper bound:
Assumptions:
- Sample obtained by simple random sampling or randomized experiment.
- Sampled values are independent; sample size is less than 5% of population.
Example: If , , and for 95% confidence, calculate bounds.

t Confidence Interval for Mean

Used to estimate the population mean when the population standard deviation is unknown.

Formula:
- Lower bound:
- Upper bound:
Assumptions:
- Sample obtained by simple random sampling or randomized experiment.
- No outliers; population is normally distributed or .
- Sampled values are independent.
Example: , , , for 95% confidence.

One Sample Hypothesis Tests

z Test for One Sample Proportion

Tests whether the sample proportion differs from a hypothesized value.

Hypotheses:
- Two-tailed: ,
- Left-tailed: ,
- Right-tailed: ,
Test Statistic:
Assumptions: Same as confidence interval for proportion.

t Test for Mean

Tests whether the sample mean differs from a hypothesized value.

Hypotheses:
- Two-tailed: ,
- Left-tailed: ,
- Right-tailed: ,
Test Statistic:
Degrees of Freedom:
Assumptions: Same as t confidence interval for mean.

Two Sample Hypothesis Tests

Two Sample z Test for Proportions

Tests whether two population proportions are equal.

Hypotheses:
- Two-tailed: ,
- Left-tailed: ,
- Right-tailed: ,
Test Statistic: where
Assumptions: Samples are independent, random, and for each sample.

Two Sample t Test for Dependent Means (Matched Pairs)

Tests whether the mean difference in paired data is zero.

Hypotheses:
- Two-tailed: ,
- Left-tailed: ,
- Right-tailed: ,
Test Statistic:
Degrees of Freedom:
Assumptions: Same as one sample mean, applied to differences.

Two Sample t Test for Independent Means (Unequal Variances)

Tests whether two population means are equal, assuming unequal variances.

Hypotheses:
- Two-tailed: ,
- Left-tailed: ,
- Right-tailed: ,
Test Statistic:
Degrees of Freedom:
Assumptions: Samples are independent, random, populations are normal or sample sizes are large.

Two Sample Confidence Intervals

Confidence Interval for Difference Between Two Proportions

Estimates the difference between two population proportions.

Formula:
- Lower bound:
- Upper bound:
Assumptions: Samples are independent, random, and , .

Confidence Interval for Mean of Differences (Paired Data)

Estimates the mean difference in paired data.

Formula:
- Lower bound:
- Upper bound:
Degrees of Freedom:
Assumptions: Same as one sample mean, applied to differences.

Confidence Interval for Difference Between Two Independent Means (Unequal Variances)

Estimates the difference between two population means.

Formula:
- Lower bound:
- Upper bound:
Degrees of Freedom:
Assumptions: Samples are independent, random, populations are normal or sample sizes are large.

Sample Size Calculations

Sample Size Needed for Proportions

Calculates the minimum sample size required to estimate a proportion with a specified margin of error.

When prior estimate is available:
When no prior estimate:
Where is the desired margin of error (as a decimal).

Sample Size Needed for Means

Calculates the minimum sample size required to estimate a mean with a specified margin of error.

Formula:
Where is the desired margin of error.

Summary Table: Hypothesis Tests and Confidence Intervals

Test/Interval	Parameter	Assumptions
One Sample z for Proportion	p	Random sample, independence,
One Sample t for Mean	\( \mu \)	Random sample, normality or , independence
Two Sample z for Proportions		Random, independent samples,
Two Sample t for Means (Independent)		Random, independent, normality or large
Paired t for Means		Random, paired, normality or large

Additional info: Bootstrapping and randomization tests are modern alternatives for inference, especially when assumptions are questionable. StatCrunch and similar software can be used for calculations and simulations.