Parametric and Nonparametric Hypothesis Testing, Central Limit Theorem, and Confidence Intervals

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Parametric and Nonparametric Tests

Introduction to Parametric and Nonparametric Tests

Statistical tests are broadly classified into parametric and nonparametric tests. The choice between these depends on the nature of the data and the assumptions that can be made about its distribution.

Parametric Tests: Assume underlying statistical distributions (often normal distribution) and require certain conditions (e.g., interval data, homogeneity of variance).
Nonparametric Tests: Do not assume a specific distribution and are used when parametric assumptions are not met (e.g., ordinal data, non-normal distribution).
Examples: t-test and ANOVA are parametric; Mann-Whitney U and Wilcoxon signed-rank are nonparametric.

Normal Distribution and Parametric Assumptions

Understanding Normal Distribution

The normal distribution is a symmetric, bell-shaped curve that describes how data values are distributed around the mean. Many parametric tests rely on the assumption that data are normally distributed.

Key Properties: Mean, median, and mode are equal; 68.3% of data falls within ±1 standard deviation, 95.4% within ±2, and 99.7% within ±3.
Why Assume Normality? Many natural phenomena follow a normal distribution, and the Central Limit Theorem supports this assumption for large samples.
Example: Heights, blood pressure, and test scores often approximate normality.

Central Limit Theorem (CLT)

Definition and Importance

The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the population's distribution.

Key Points:
- If a large number of samples (n) are taken, the sampling mean () approaches the population mean ().
- The standard deviation of the sampling mean (standard error, SE) approaches the population standard deviation ().
- If the population is normal, the sampling distribution is normal for all sample sizes.
- For large n, the sampling distribution is approximately normal even if the population is not.
Formula for Standard Error:
Example: Rolling multiple dice and summing their outcomes demonstrates how the distribution of sums becomes normal as the number of dice increases.

Hypothesis Testing

Concepts and Terminology

Hypothesis testing is a statistical method used to make inferences about populations based on sample data. It allows us to test claims about population parameters.

Null Hypothesis (): Proposes no difference or relationship between variables.
Alternative Hypothesis (): Contradicts the null hypothesis, suggesting a difference or relationship exists.
Types of Claims: Comparing means or proportions.
Example: Testing if the average fasting blood glucose is 85 mg/dL or not.

Types of Hypothesis Tests: 1-Tailed vs 2-Tailed

Hypothesis tests can be one-tailedtwo-tailed, depending on the research question.

	2-Tailed Test	Right-Tailed	Left-Tailed
Null Hypothesis
Alternative Hypothesis

Errors in Hypothesis Testing

Decisions in hypothesis testing can result in errors:

Type I Error (): Rejecting when it is true (false positive).
Type II Error (): Failing to reject when it is false (false negative).
Power: Probability of correctly rejecting when it is false ().

	True	False
Reject	Type I Error ()	Power ()
Fail to Reject	Correct Decision ()	Type II Error ()

Alpha and p-Value

Alpha (): The threshold probability for rejecting ; common values are 0.01, 0.05, 0.10.
p-Value: Probability of obtaining a test statistic as extreme as the observed, assuming is true.
Decision Rule:
- If p-value ≤ , reject .
- If p-value > , do not reject .
Interpretation: The smaller the p-value, the stronger the evidence against .

Comparing Means: Hypothesis Testing Methods

One Sample Mean

Used to test whether the mean of a single sample differs from a known or hypothesized population mean.

Null Hypothesis:
Alternative Hypothesis:

Two Independent Samples

Used to compare the means of two independent groups.

Example: Comparing mean blood pressure between males and females.
Null Hypothesis:
Alternative Hypothesis:

Paired/Dependent Samples

Used when observations are paired, such as before-and-after measurements on the same subjects.

Example: Measuring height in the morning and at night for the same individuals.
Null Hypothesis: The mean difference between pairs is zero.

Parametric Tests: Z Test and T Test

Selection Criteria for Z Test and T Test

Test	Selection Criteria
Z test	Data points should be independent. Preferable when n > 30. Distribution should be normal if n is low; for n > 30, normality is less critical. Variances of samples should be the same (F-test). Random selection from population. Equal chance of being selected. Sample sizes should be as equal as possible.
T-Test	Used when n ≤ 30. Normality required for equal and unequal variance t-tests. Variances should be the same for equal variance t-test. Random selection from population. Equal chance of being selected. Sample sizes should be as equal as possible.

Confidence Intervals

Definition and Interpretation

A confidence interval is a range of values, derived from sample statistics, that is likely to contain the population parameter with a specified probability (commonly 95%).

Interpretation: If we repeated the experiment many times, 95% of the calculated intervals would contain the true mean.

Confidence Interval for One Sample (n ≥ 30)

Formula: Where
Example Calculation: Given mean = 85, = 18.2, n = 100, = 2.576 (for 99% CI): Lower Limit = 80.31, Upper Limit = 89.69

Confidence Interval for One Sample (n < 30)

Formula:

Confidence Interval for Two Samples

Formula for Difference of Means:
Example: Men: n = 1623, mean = 128.2, s = 17.5; Women: n = 1911, mean = 126.5, s = 20.1 Calculation yields LL = 0.46, UL = 2.94

Confidence Interval for Two Samples (Small n, Equal Variance)

Formula:
Example: Men: n = 6, mean = 117.5, s = 9.7; Women: n = 4, mean = 126.8, s = 12.0 Calculation yields LL = -25.07, UL = 6.47

Relationship Between Hypothesis Testing and Confidence Intervals

If the confidence intervals of two groups do not overlap, they are most likely to be significantly different from each other.

Summary Table: Key Formulas

Test	Formula	When to Use
One Sample Z		n ≥ 30, population SD known
One Sample t		n < 30, sample SD
Two Sample Z		Large n, population SD known
Two Sample t (Equal Variance)		Small n, equal variance

Additional info:

Nonparametric tests are not covered in detail in these notes but are important when parametric assumptions are violated.
Visualizations (e.g., histograms, boxplots) are useful for assessing normality and distribution shape before choosing a test.