Lecture 25: Confidence Intervals – Proportions, Means, and the t-Distribution

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Confidence Intervals

Introduction to Confidence Intervals

Confidence intervals are a fundamental concept in inferential statistics, providing a range of plausible values for an unknown population parameter based on sample data. They quantify the uncertainty associated with sample estimates and are widely used for both proportions and means.

Confidence Interval (CI): An interval estimate, calculated from the sample, that is likely to contain the true value of a population parameter with a specified probability (confidence level).
Confidence Level: The probability (e.g., 95%) that the interval contains the true parameter value in repeated sampling.

Confidence Intervals for Proportions

Constructing a Confidence Interval for a Population Proportion

Suppose a simple random sample of size n is taken from a population, or the data are from a randomized experiment. The (1-α)100% confidence interval for the population proportion p is given by:

Lower bound:
Upper bound:

Conditions: To use this interval, ensure that and (sample size is less than 5% of the population).

Example: Proportion Confidence Interval

Suppose a random sample of 100 people shows that 25 are left-handed. Find a 95% confidence interval for the true proportion of left-handers.

Point estimate:
Margin of error:
CI:

Margin of Error for Proportions

The margin of error quantifies the maximum expected difference between the true population parameter and a sample estimate:

Example: Survey on Teen Texting While Driving

In a survey of 800 teenagers, 272 reported texting while driving. Find a 95% confidence interval for the proportion.

Check:
Margin of error:
CI:

Interpretation: We are 95% confident that the true proportion of 16- to 17-year-olds who text while driving is between 0.307 and 0.373.

Effect of Confidence Level on Margin of Error

Increasing the confidence level (e.g., from 95% to 99%) increases the margin of error, resulting in a wider confidence interval.
Example: For the same data, a 99% CI uses and yields .

Factors Affecting Margin of Error

Level of confidence: Higher confidence increases margin of error.
Sample size: Larger sample size decreases margin of error.
Population standard deviation: Greater variability increases margin of error.

Sample Size Determination for Proportions

Determining Required Sample Size

To achieve a desired margin of error E at a given confidence level, the minimum sample size n is:

If is known:
If is unknown: (using for maximum variability)

Always round up to the next integer.

Example: Carpooling Survey

Desired margin of error: 2% (), 90% confidence ()
(a) Using prior estimate :
(b) No prior estimate:

Conclusion: Not having a prior estimate of can more than double the required sample size.

Point and Interval Estimation for the Mean

Point Estimation for the Mean

Point estimate: The value of a statistic used to estimate a parameter. For the mean, the sample mean is the point estimate for the population mean .
Formula:

Interval Estimation for the Mean

The (1-α)100% confidence interval for the population mean depends on:

Sample size: Large () or small ()
Population standard deviation : Known or unknown
Sampling distribution of the sample mean

Summary Table: Confidence Intervals for the Mean

	Large Sample ()	Small Sample ()
known
unknown

Confidence Interval for , Known

Assumptions: Simple random sample, population standard deviation known, population normal or
Formula:

Example

Sample: , , , 95% CI ()
CI:

Interpretation: We are 95% confident that the interval (70.15, 74.85) contains the true population mean.

Confidence Interval for , Unknown

For large (): Replace with sample standard deviation .
Formula:
For small (): Use Student's t-distribution.
Formula:

Student's t-Distribution

Definition and Properties

When the population standard deviation is unknown and the sample size is small, the sampling distribution of the sample mean follows the Student's t-distribution with degrees of freedom.

The t-distribution is symmetric and centered at 0.
It has fatter tails than the standard normal distribution, reflecting greater uncertainty.
As sample size increases, the t-distribution approaches the standard normal distribution.

Properties of the t-Distribution

Different for each degree of freedom (df = n-1).
Centered at 0 and symmetric.
Total area under the curve is 1; area to the right of 0 equals area to the left.
As df increases, the curve approaches the normal distribution.
More area in the tails than the normal distribution (reflects extra variability from estimating with ).

Critical Values and Confidence Intervals Using t-Distribution

Critical value is used instead of .
Formula:

Example: Confidence Interval with t-Distribution

Sample: , ,
95% CI ():
98% CI ():

Normality Condition and Robustness

When to Use the t-Distribution

For , check that the data are approximately normal and have no outliers (use normal probability plot and boxplot).
For , use t-distribution only if data are symmetric and have no outliers.
For , t-distribution can be used if data are not extremely skewed and have no outliers.
For , t-distribution can be used even for skewed distributions (Central Limit Theorem applies).

Summary Table: Confidence Intervals for the Mean

	Large Sample ()	Small Sample ()
known
unknown

Additional Information

Student's t-distribution was discovered by William Sealy Gosset, who published under the pseudonym "Student" while working at Guinness brewery.
The t-distribution is robust to minor departures from normality, especially as sample size increases.