BackChi-square Tests: Applications and Interpretation in Epidemiology
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chi-square Tests
Introduction to Chi-square Tests
Chi-square tests are a family of statistical tests used to determine whether observed frequencies differ significantly from expected frequencies under a specific hypothesis. They are widely used in categorical data analysis, especially in biological and epidemiological studies.
Key Purpose: To assess whether differences between observed and expected counts are due to chance or indicate a meaningful association.
Common Applications: Goodness-of-fit, test of independence, and test of homogeneity.
West Nile Virus Case Study
Background and Research Questions
West Nile Virus (WNV) is a mosquito-borne virus. Most infected individuals are asymptomatic or have mild symptoms, but some develop severe neurological disease. Epidemiological questions include:
Are all age groups equally likely to be diagnosed with WNV?
Are all age groups equally likely to die from WNV?
Chi-square Goodness-of-Fit Test
Comparing Observed and Expected Distributions
The goodness-of-fit test compares the observed distribution of a categorical variable to an expected distribution based on a known population or theoretical model.
Null Hypothesis (H0): The observed frequencies match the expected frequencies (e.g., age distribution among WNV cases matches the general population).
Expected Frequency: For each category, calculated as:
Example Table: Age Distribution of WNV Cases vs. Census Data
Age Group | Cases | Relative Frequency | Census Data 2010* | Expected Frequency |
|---|---|---|---|---|
0 - 44 | 240 | 0.31 | 0.727 | 0.727 × 773 = 562.0 |
45 - 64 | 335 | 0.43 | 0.187 | 0.187 × 773 = 144.5 |
65 - 100 | 198 | 0.26 | 0.086 | 0.086 × 773 = 66.5 |
Unknown | 6 | - | - | - |
Total (valid) | 773 | 1 | 1 | 773 |
Calculating the Chi-square Statistic
For each category, calculate:
Oi: Observed frequency in category i
Ei: Expected frequency in category i
Example Calculation Table
Age Group | Observed | Expected | Obs - Exp | Std. Residuals (z-scores) | Component to |
|---|---|---|---|---|---|
0 - 44 | 240 | 562.0 | -322 | -13.6 | 185 |
45 - 64 | 335 | 144.5 | 190.5 | 15.8 | 249.6 |
65 - 100 | 198 | 66.5 | 131.5 | 16.1 | 259.2 |
Total | 773 | 773 | 0 | - | 693.8 |
Degrees of Freedom:
In this example:
Interpreting the Results
If the calculated statistic is much larger than expected under the null hypothesis (with a small p-value), we reject the null hypothesis.
Example: , , -value < 0.001 indicates a significant difference between observed and expected age distributions among WNV cases.
Standardized Residuals: Values above 2 or below -2 indicate categories that are over- or under-represented, respectively.
Summary of Findings
Seniors (65+) and middle-aged (45-64) are significantly over-represented among WNV cases.
Younger individuals (0-44) are significantly under-represented.
Types of Chi-square Tests
Overview and Comparison
Goodness-of-Fit Test: Compares observed frequencies to expected frequencies from a known distribution (one categorical variable).
Test of Independence: Assesses whether two categorical variables are independent in a contingency table.
Test of Homogeneity: Compares distributions of a categorical variable across different populations.
Comparison Table
Test Type | Purpose | Data Structure | Degrees of Freedom |
|---|---|---|---|
Goodness-of-Fit | Compare observed to expected frequencies (one variable) | One categorical variable | # categories - 1 |
Test of Independence | Test association between two variables | Contingency table (two variables) | (# rows - 1) × (# columns - 1) |
Test of Homogeneity | Compare distributions across groups | Contingency table (two variables) | (# rows - 1) × (# columns - 1) |
Chi-square Test Calculations
Step-by-Step Procedure
State the null and alternative hypotheses.
Calculate expected frequencies for each cell:
Compute the chi-square statistic:
Determine degrees of freedom:
Compare the calculated statistic to the critical value or use the p-value to draw a conclusion.
Properties of the Chi-square Distribution
Right-skewed distribution; shape depends on degrees of freedom.
Always an upper-tail test (chi-square statistic cannot be negative).
Larger statistics indicate greater deviation from the null hypothesis.
Interpretation and Application
Key Points for Interpretation
Standardized residuals (z-scores) help identify which categories contribute most to the chi-square statistic.
Significant results indicate that observed frequencies differ from expected frequencies more than would be expected by chance.
Always check assumptions: expected frequencies should generally be at least 5 in each cell.
Example Application: WNV and Age
Observed: Seniors and middle-aged groups are over-represented among WNV cases; young people are under-represented.
Interpretation: There is a statistically significant association between age and WNV diagnosis.
Additional info:
Chi-square tests are non-parametric and do not require normality.
They are sensitive to sample size; large samples can detect small differences as significant.