Skip to main content
Back

Chi-square Tests: Applications and Interpretation in Epidemiology

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Chi-square Tests

Introduction to Chi-square Tests

Chi-square tests are a family of statistical tests used to determine whether observed frequencies differ significantly from expected frequencies under a specific hypothesis. They are widely used in categorical data analysis, especially in biological and epidemiological studies.

  • Key Purpose: To assess whether differences between observed and expected counts are due to chance or indicate a meaningful association.

  • Common Applications: Goodness-of-fit, test of independence, and test of homogeneity.

West Nile Virus Case Study

Background and Research Questions

West Nile Virus (WNV) is a mosquito-borne virus. Most infected individuals are asymptomatic or have mild symptoms, but some develop severe neurological disease. Epidemiological questions include:

  • Are all age groups equally likely to be diagnosed with WNV?

  • Are all age groups equally likely to die from WNV?

Chi-square Goodness-of-Fit Test

Comparing Observed and Expected Distributions

The goodness-of-fit test compares the observed distribution of a categorical variable to an expected distribution based on a known population or theoretical model.

  • Null Hypothesis (H0): The observed frequencies match the expected frequencies (e.g., age distribution among WNV cases matches the general population).

  • Expected Frequency: For each category, calculated as:

Example Table: Age Distribution of WNV Cases vs. Census Data

Age Group

Cases

Relative Frequency

Census Data 2010*

Expected Frequency

0 - 44

240

0.31

0.727

0.727 × 773 = 562.0

45 - 64

335

0.43

0.187

0.187 × 773 = 144.5

65 - 100

198

0.26

0.086

0.086 × 773 = 66.5

Unknown

6

-

-

-

Total (valid)

773

1

1

773

Calculating the Chi-square Statistic

  • For each category, calculate:

  • Oi: Observed frequency in category i

  • Ei: Expected frequency in category i

Example Calculation Table

Age Group

Observed

Expected

Obs - Exp

Std. Residuals (z-scores)

Component to

0 - 44

240

562.0

-322

-13.6

185

45 - 64

335

144.5

190.5

15.8

249.6

65 - 100

198

66.5

131.5

16.1

259.2

Total

773

773

0

-

693.8

  • Degrees of Freedom:

  • In this example:

Interpreting the Results

  • If the calculated statistic is much larger than expected under the null hypothesis (with a small p-value), we reject the null hypothesis.

  • Example: , , -value < 0.001 indicates a significant difference between observed and expected age distributions among WNV cases.

  • Standardized Residuals: Values above 2 or below -2 indicate categories that are over- or under-represented, respectively.

Summary of Findings

  • Seniors (65+) and middle-aged (45-64) are significantly over-represented among WNV cases.

  • Younger individuals (0-44) are significantly under-represented.

Types of Chi-square Tests

Overview and Comparison

  • Goodness-of-Fit Test: Compares observed frequencies to expected frequencies from a known distribution (one categorical variable).

  • Test of Independence: Assesses whether two categorical variables are independent in a contingency table.

  • Test of Homogeneity: Compares distributions of a categorical variable across different populations.

Comparison Table

Test Type

Purpose

Data Structure

Degrees of Freedom

Goodness-of-Fit

Compare observed to expected frequencies (one variable)

One categorical variable

# categories - 1

Test of Independence

Test association between two variables

Contingency table (two variables)

(# rows - 1) × (# columns - 1)

Test of Homogeneity

Compare distributions across groups

Contingency table (two variables)

(# rows - 1) × (# columns - 1)

Chi-square Test Calculations

Step-by-Step Procedure

  1. State the null and alternative hypotheses.

  2. Calculate expected frequencies for each cell:

  1. Compute the chi-square statistic:

  1. Determine degrees of freedom:

  1. Compare the calculated statistic to the critical value or use the p-value to draw a conclusion.

Properties of the Chi-square Distribution

  • Right-skewed distribution; shape depends on degrees of freedom.

  • Always an upper-tail test (chi-square statistic cannot be negative).

  • Larger statistics indicate greater deviation from the null hypothesis.

Interpretation and Application

Key Points for Interpretation

  • Standardized residuals (z-scores) help identify which categories contribute most to the chi-square statistic.

  • Significant results indicate that observed frequencies differ from expected frequencies more than would be expected by chance.

  • Always check assumptions: expected frequencies should generally be at least 5 in each cell.

Example Application: WNV and Age

  • Observed: Seniors and middle-aged groups are over-represented among WNV cases; young people are under-represented.

  • Interpretation: There is a statistically significant association between age and WNV diagnosis.

Additional info:

  • Chi-square tests are non-parametric and do not require normality.

  • They are sensitive to sample size; large samples can detect small differences as significant.

Pearson Logo

Study Prep