Chi-square Tests: Applications and Interpretation in Epidemiology

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chi-square Tests

Introduction to Chi-square Tests

Chi-square tests are a family of statistical tests used to determine whether observed frequencies differ significantly from expected frequencies under a specific hypothesis. They are widely used in categorical data analysis, especially in biological and epidemiological studies.

Key Purpose: To assess whether differences between observed and expected counts are due to chance or indicate a meaningful association.
Common Applications: Goodness-of-fit, test of independence, and test of homogeneity.

West Nile Virus Case Study

Background and Research Questions

West Nile Virus (WNV) is a mosquito-borne virus. Most infected individuals are asymptomatic or have mild symptoms, but some develop severe neurological disease. Epidemiological questions include:

Are all age groups equally likely to be diagnosed with WNV?
Are all age groups equally likely to die from WNV?

Chi-square Goodness-of-Fit Test

Comparing Observed and Expected Distributions

The goodness-of-fit test compares the observed distribution of a categorical variable to an expected distribution based on a known population or theoretical model.

Null Hypothesis (H0): The observed frequencies match the expected frequencies (e.g., age distribution among WNV cases matches the general population).
Expected Frequency: For each category, calculated as:

Example Table: Age Distribution of WNV Cases vs. Census Data

Age Group	Cases	Relative Frequency	Census Data 2010*	Expected Frequency
0 - 44	240	0.31	0.727	0.727 × 773 = 562.0
45 - 64	335	0.43	0.187	0.187 × 773 = 144.5
65 - 100	198	0.26	0.086	0.086 × 773 = 66.5
Unknown	6	-	-	-
Total (valid)	773	1	1	773

Calculating the Chi-square Statistic

For each category, calculate:

Oi: Observed frequency in category i
Ei: Expected frequency in category i

Example Calculation Table

Age Group	Observed	Expected	Obs - Exp	Std. Residuals (z-scores)	Component to
0 - 44	240	562.0	-322	-13.6	185
45 - 64	335	144.5	190.5	15.8	249.6
65 - 100	198	66.5	131.5	16.1	259.2
Total	773	773	0	-	693.8

Degrees of Freedom:
In this example:

Interpreting the Results

If the calculated statistic is much larger than expected under the null hypothesis (with a small p-value), we reject the null hypothesis.
Example: , , -value < 0.001 indicates a significant difference between observed and expected age distributions among WNV cases.
Standardized Residuals: Values above 2 or below -2 indicate categories that are over- or under-represented, respectively.

Summary of Findings

Seniors (65+) and middle-aged (45-64) are significantly over-represented among WNV cases.
Younger individuals (0-44) are significantly under-represented.

Types of Chi-square Tests

Overview and Comparison

Goodness-of-Fit Test: Compares observed frequencies to expected frequencies from a known distribution (one categorical variable).
Test of Independence: Assesses whether two categorical variables are independent in a contingency table.
Test of Homogeneity: Compares distributions of a categorical variable across different populations.

Comparison Table

Test Type	Purpose	Data Structure	Degrees of Freedom
Goodness-of-Fit	Compare observed to expected frequencies (one variable)	One categorical variable	# categories - 1
Test of Independence	Test association between two variables	Contingency table (two variables)	(# rows - 1) × (# columns - 1)
Test of Homogeneity	Compare distributions across groups	Contingency table (two variables)	(# rows - 1) × (# columns - 1)

Chi-square Test Calculations

Step-by-Step Procedure

State the null and alternative hypotheses.
Calculate expected frequencies for each cell:

Compute the chi-square statistic:

Determine degrees of freedom:

Compare the calculated statistic to the critical value or use the p-value to draw a conclusion.

Properties of the Chi-square Distribution

Right-skewed distribution; shape depends on degrees of freedom.
Always an upper-tail test (chi-square statistic cannot be negative).
Larger statistics indicate greater deviation from the null hypothesis.

Interpretation and Application

Key Points for Interpretation

Standardized residuals (z-scores) help identify which categories contribute most to the chi-square statistic.
Significant results indicate that observed frequencies differ from expected frequencies more than would be expected by chance.
Always check assumptions: expected frequencies should generally be at least 5 in each cell.

Example Application: WNV and Age

Observed: Seniors and middle-aged groups are over-represented among WNV cases; young people are under-represented.
Interpretation: There is a statistically significant association between age and WNV diagnosis.

Additional info:

Chi-square tests are non-parametric and do not require normality.
They are sensitive to sample size; large samples can detect small differences as significant.