Estimating Parameters and Determining Sample Sizes: Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chapter 7: Estimating Parameters and Sample Sizes

Introduction to Estimation in Statistics

Estimation is a fundamental concept in inferential statistics, where sample data is used to estimate population parameters. This chapter focuses on point estimates, interval estimates (confidence intervals), and determining appropriate sample sizes for statistical inference.

Point Estimate: A single value estimate of a population parameter, such as the sample mean or sample proportion.
Interval Estimate: A range of values (confidence interval) likely to contain the population parameter.
Sample Size Determination: Calculating the number of observations required to achieve a desired level of precision.

Estimating Population Proportion (p)

The sample proportion (p̂) is the best point estimate of the population proportion (p). Under certain conditions, the sampling distribution of p̂ is approximately normal.

Formula for Sample Proportion: $\hat{p} = \frac{x}{n}$, where x is the number of successes and n is the sample size.
Normal Approximation: The sampling distribution of p̂ is normal if $np \geq 5$ and $nq \geq 5$.
Example: If 157 out of 280 patients experience an adverse reaction, $\hat{p} = \frac{157}{280} = 0.561$.

Confidence Intervals for Population Proportion

A confidence interval provides a range of values within which the population proportion is likely to fall. The most common confidence level is 95%.

General Formula: $\hat{p} \pm E$, where E is the margin of error.
Margin of Error (E): $E = z_{\alpha/2} \sqrt{\frac{\hat{p}\hat{q}}{n}}$, where $\hat{q} = 1 - \hat{p}$ and $z_{\alpha/2}$ is the critical value from the standard normal distribution.
Example Calculation: For $\hat{p} = 0.44$, $n = 381$, $z_{0.025} = 1.96$:
- $E = 1.96 \sqrt{\frac{0.44 \times 0.56}{381}} \approx 0.05$
- 95% CI: $0.44 \pm 0.05 = (0.39, 0.49)$

Determining Sample Size for Estimating Proportion

To achieve a desired margin of error and confidence level, the required sample size can be calculated using:

Sample Size Formula: $n = \frac{z_{\alpha/2}^2 \hat{p} \hat{q}}{E^2}$
Conservative Estimate: If $\hat{p}$ is unknown, use $\hat{p} = 0.5$ for maximum sample size.
Example: To estimate a proportion with $E = 0.03$ and 95% confidence ($z_{0.025} = 1.96$):
- $n = \frac{(1.96)^2 \times 0.5 \times 0.5}{(0.03)^2} \approx 1067$

Estimating Population Mean (μ)

The sample mean (\bar{x}) is the best point estimate of the population mean (μ). Confidence intervals for the mean depend on whether the population standard deviation (σ) is known.

Known σ: $\bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$
Unknown σ: $\bar{x} \pm t_{\alpha/2, df} \frac{s}{\sqrt{n}}$, where s is the sample standard deviation and df is degrees of freedom ($n-1$).
Example: For $\bar{x} = 150$, $s = 15$, $n = 36$, $t_{0.025, 35} \approx 2.03$:
- 95% CI: $150 \pm 2.03 \times \frac{15}{6} = 150 \pm 5.08 = (144.92, 155.08)$

Degrees of Freedom

Degrees of freedom (df) refer to the number of independent values in a calculation. For a sample of size n, $df = n - 1$ for estimating the mean.

Importance: Used in t-distribution and chi-square distribution calculations.
Example: For $n = 30$, $df = 29$.

Estimating Population Variance and Standard Deviation

The sample variance (s²) and sample standard deviation (s) are used as point estimates for the population variance (σ²) and standard deviation (σ).

Sample Variance: $s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1}$
Confidence Interval for Variance: $\left( \frac{(n-1)s^2}{\chi^2_{\alpha/2}}, \frac{(n-1)s^2}{\chi^2_{1-\alpha/2}} \right)$, where $\chi^2$ is the chi-square critical value.
Example: For $n = 10$, $s^2 = 1.5$, $\chi^2_{0.025,9} = 19.02$, $\chi^2_{0.975,9} = 2.7$:
- 95% CI: $\left( \frac{9 \times 1.5}{19.02}, \frac{9 \times 1.5}{2.7} \right) = (0.71, 5.0)$

Simulation of Confidence Intervals

Simulations can illustrate the behavior of confidence intervals over repeated samples. For example, generating 1,000 confidence intervals for a mean or proportion shows that approximately 95% of intervals contain the true parameter when using a 95% confidence level.

Application: Used to visualize the reliability of interval estimates.
Interpretation: Not every interval will contain the true parameter, but the proportion matches the confidence level.

Summary Table: Confidence Interval Formulas

Parameter	Point Estimate	Confidence Interval Formula	Distribution
Proportion (p)	$\hat{p}$	$\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}\hat{q}}{n}}$	Normal
Mean (μ), σ known	$\bar{x}$	$\bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$	Normal
Mean (μ), σ unknown	$\bar{x}$	$\bar{x} \pm t_{\alpha/2, df} \frac{s}{\sqrt{n}}$	t-distribution
Variance (σ²)	$s^2$	$\left( \frac{(n-1)s^2}{\chi^2_{\alpha/2}}, \frac{(n-1)s^2}{\chi^2_{1-\alpha/2}} \right)$	Chi-square

Key Terms and Concepts

Critical Value: The value that separates the rejection region from the non-rejection region in hypothesis testing or confidence interval estimation.
Margin of Error: The maximum likely difference between the sample estimate and the true population parameter.
Confidence Level: The probability that the interval estimate contains the population parameter (commonly 90%, 95%, or 99%).
Degrees of Freedom: Number of independent values in a calculation, typically $n-1$ for a sample mean.

Examples and Applications

Medical Studies: Estimating the proportion of patients with adverse reactions to a drug.
Social Science: Determining if events (e.g., holidays) affect mortality rates using confidence intervals.
Quality Control: Estimating mean and variance of product measurements.

Additional info:

Simulations using software (e.g., StatCrunch) help visualize confidence intervals and their coverage probability.
When the sample size is large, the normal approximation is more accurate for proportions and means.