BackLecture 25: Confidence Intervals – Proportions, Means, and the t-Distribution
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Confidence Intervals
Introduction to Confidence Intervals
Confidence intervals are a fundamental concept in inferential statistics, providing a range of plausible values for an unknown population parameter based on sample data. They quantify the uncertainty associated with sample estimates and are widely used for both proportions and means.
Confidence Interval (CI): An interval estimate, calculated from the sample, that is likely to contain the true value of a population parameter with a specified probability (confidence level).
Confidence Level: The probability (e.g., 95%) that the interval contains the true parameter value in repeated sampling.
Confidence Intervals for Proportions
Constructing a Confidence Interval for a Population Proportion
Suppose a simple random sample of size n is taken from a population, or the data are from a randomized experiment. The (1-α)100% confidence interval for the population proportion p is given by:
Lower bound:
Upper bound:
Conditions: To use this interval, ensure that and (sample size is less than 5% of the population).
Example: Proportion Confidence Interval
Suppose a random sample of 100 people shows that 25 are left-handed. Find a 95% confidence interval for the true proportion of left-handers.
Point estimate:
Margin of error:
CI:
Margin of Error for Proportions
The margin of error quantifies the maximum expected difference between the true population parameter and a sample estimate:
Example: Survey on Teen Texting While Driving
In a survey of 800 teenagers, 272 reported texting while driving. Find a 95% confidence interval for the proportion.
Check:
Margin of error:
CI:
Interpretation: We are 95% confident that the true proportion of 16- to 17-year-olds who text while driving is between 0.307 and 0.373.
Effect of Confidence Level on Margin of Error
Increasing the confidence level (e.g., from 95% to 99%) increases the margin of error, resulting in a wider confidence interval.
Example: For the same data, a 99% CI uses and yields .
Factors Affecting Margin of Error
Level of confidence: Higher confidence increases margin of error.
Sample size: Larger sample size decreases margin of error.
Population standard deviation: Greater variability increases margin of error.
Sample Size Determination for Proportions
Determining Required Sample Size
To achieve a desired margin of error E at a given confidence level, the minimum sample size n is:
If is known:
If is unknown: (using for maximum variability)
Always round up to the next integer.
Example: Carpooling Survey
Desired margin of error: 2% (), 90% confidence ()
(a) Using prior estimate :
(b) No prior estimate:
Conclusion: Not having a prior estimate of can more than double the required sample size.
Point and Interval Estimation for the Mean
Point Estimation for the Mean
Point estimate: The value of a statistic used to estimate a parameter. For the mean, the sample mean is the point estimate for the population mean .
Formula:
Interval Estimation for the Mean
The (1-α)100% confidence interval for the population mean depends on:
Sample size: Large () or small ()
Population standard deviation : Known or unknown
Sampling distribution of the sample mean
Summary Table: Confidence Intervals for the Mean
Large Sample () | Small Sample () | |
|---|---|---|
known | ||
unknown |
Confidence Interval for , Known
Assumptions: Simple random sample, population standard deviation known, population normal or
Formula:
Example
Sample: , , , 95% CI ()
CI:
Interpretation: We are 95% confident that the interval (70.15, 74.85) contains the true population mean.
Confidence Interval for , Unknown
For large (): Replace with sample standard deviation .
Formula:
For small (): Use Student's t-distribution.
Formula:
Student's t-Distribution
Definition and Properties
When the population standard deviation is unknown and the sample size is small, the sampling distribution of the sample mean follows the Student's t-distribution with degrees of freedom.
The t-distribution is symmetric and centered at 0.
It has fatter tails than the standard normal distribution, reflecting greater uncertainty.
As sample size increases, the t-distribution approaches the standard normal distribution.
Properties of the t-Distribution
Different for each degree of freedom (df = n-1).
Centered at 0 and symmetric.
Total area under the curve is 1; area to the right of 0 equals area to the left.
As df increases, the curve approaches the normal distribution.
More area in the tails than the normal distribution (reflects extra variability from estimating with ).
Critical Values and Confidence Intervals Using t-Distribution
Critical value is used instead of .
Formula:
Example: Confidence Interval with t-Distribution
Sample: , ,
95% CI ():
98% CI ():
Normality Condition and Robustness
When to Use the t-Distribution
For , check that the data are approximately normal and have no outliers (use normal probability plot and boxplot).
For , use t-distribution only if data are symmetric and have no outliers.
For , t-distribution can be used if data are not extremely skewed and have no outliers.
For , t-distribution can be used even for skewed distributions (Central Limit Theorem applies).
Summary Table: Confidence Intervals for the Mean
Large Sample () | Small Sample () | |
|---|---|---|
known | ||
unknown |
Additional Information
Student's t-distribution was discovered by William Sealy Gosset, who published under the pseudonym "Student" while working at Guinness brewery.
The t-distribution is robust to minor departures from normality, especially as sample size increases.