BackParametric and Nonparametric Hypothesis Testing, Central Limit Theorem, and Confidence Intervals
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Parametric and Nonparametric Tests
Introduction to Parametric and Nonparametric Tests
Statistical tests are broadly classified into parametric and nonparametric tests. The choice between these depends on the nature of the data and the assumptions that can be made about its distribution.
Parametric Tests: Assume underlying statistical distributions (often normal distribution) and require certain conditions (e.g., interval data, homogeneity of variance).
Nonparametric Tests: Do not assume a specific distribution and are used when parametric assumptions are not met (e.g., ordinal data, non-normal distribution).
Examples: t-test and ANOVA are parametric; Mann-Whitney U and Wilcoxon signed-rank are nonparametric.
Normal Distribution and Parametric Assumptions
Understanding Normal Distribution
The normal distribution is a symmetric, bell-shaped curve that describes how data values are distributed around the mean. Many parametric tests rely on the assumption that data are normally distributed.
Key Properties: Mean, median, and mode are equal; 68.3% of data falls within ±1 standard deviation, 95.4% within ±2, and 99.7% within ±3.
Why Assume Normality? Many natural phenomena follow a normal distribution, and the Central Limit Theorem supports this assumption for large samples.
Example: Heights, blood pressure, and test scores often approximate normality.
Central Limit Theorem (CLT)
Definition and Importance
The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the population's distribution.
Key Points:
If a large number of samples (n) are taken, the sampling mean () approaches the population mean ().
The standard deviation of the sampling mean (standard error, SE) approaches the population standard deviation ().
If the population is normal, the sampling distribution is normal for all sample sizes.
For large n, the sampling distribution is approximately normal even if the population is not.
Formula for Standard Error:
Example: Rolling multiple dice and summing their outcomes demonstrates how the distribution of sums becomes normal as the number of dice increases.
Hypothesis Testing
Concepts and Terminology
Hypothesis testing is a statistical method used to make inferences about populations based on sample data. It allows us to test claims about population parameters.
Null Hypothesis (): Proposes no difference or relationship between variables.
Alternative Hypothesis (): Contradicts the null hypothesis, suggesting a difference or relationship exists.
Types of Claims: Comparing means or proportions.
Example: Testing if the average fasting blood glucose is 85 mg/dL or not.
Types of Hypothesis Tests: 1-Tailed vs 2-Tailed
Hypothesis tests can be one-tailedtwo-tailed, depending on the research question.
2-Tailed Test | Right-Tailed | Left-Tailed | |
|---|---|---|---|
Null Hypothesis | |||
Alternative Hypothesis |
Errors in Hypothesis Testing
Decisions in hypothesis testing can result in errors:
Type I Error (): Rejecting when it is true (false positive).
Type II Error (): Failing to reject when it is false (false negative).
Power: Probability of correctly rejecting when it is false ().
True | False | |
|---|---|---|
Reject | Type I Error () | Power () |
Fail to Reject | Correct Decision () | Type II Error () |
Alpha and p-Value
Alpha (): The threshold probability for rejecting ; common values are 0.01, 0.05, 0.10.
p-Value: Probability of obtaining a test statistic as extreme as the observed, assuming is true.
Decision Rule:
If p-value ≤ , reject .
If p-value > , do not reject .
Interpretation: The smaller the p-value, the stronger the evidence against .
Comparing Means: Hypothesis Testing Methods
One Sample Mean
Used to test whether the mean of a single sample differs from a known or hypothesized population mean.
Null Hypothesis:
Alternative Hypothesis:
Two Independent Samples
Used to compare the means of two independent groups.
Example: Comparing mean blood pressure between males and females.
Null Hypothesis:
Alternative Hypothesis:
Paired/Dependent Samples
Used when observations are paired, such as before-and-after measurements on the same subjects.
Example: Measuring height in the morning and at night for the same individuals.
Null Hypothesis: The mean difference between pairs is zero.
Parametric Tests: Z Test and T Test
Selection Criteria for Z Test and T Test
Test | Selection Criteria |
|---|---|
Z test |
|
T-Test |
|
Confidence Intervals
Definition and Interpretation
A confidence interval is a range of values, derived from sample statistics, that is likely to contain the population parameter with a specified probability (commonly 95%).
Interpretation: If we repeated the experiment many times, 95% of the calculated intervals would contain the true mean.
Confidence Interval for One Sample (n ≥ 30)
Formula: Where
Example Calculation: Given mean = 85, = 18.2, n = 100, = 2.576 (for 99% CI): Lower Limit = 80.31, Upper Limit = 89.69
Confidence Interval for One Sample (n < 30)
Formula:
Confidence Interval for Two Samples
Formula for Difference of Means:
Example: Men: n = 1623, mean = 128.2, s = 17.5; Women: n = 1911, mean = 126.5, s = 20.1 Calculation yields LL = 0.46, UL = 2.94
Confidence Interval for Two Samples (Small n, Equal Variance)
Formula:
Example: Men: n = 6, mean = 117.5, s = 9.7; Women: n = 4, mean = 126.8, s = 12.0 Calculation yields LL = -25.07, UL = 6.47
Relationship Between Hypothesis Testing and Confidence Intervals
If the confidence intervals of two groups do not overlap, they are most likely to be significantly different from each other.
Summary Table: Key Formulas
Test | Formula | When to Use |
|---|---|---|
One Sample Z | n ≥ 30, population SD known | |
One Sample t | n < 30, sample SD | |
Two Sample Z | Large n, population SD known | |
Two Sample t (Equal Variance) | Small n, equal variance |
Additional info:
Nonparametric tests are not covered in detail in these notes but are important when parametric assumptions are violated.
Visualizations (e.g., histograms, boxplots) are useful for assessing normality and distribution shape before choosing a test.