BackChi-Square Tests: Goodness-of-Fit, Independence, and Homogeneity
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chi-Square Tests for Categorical Data
Introduction to Chi-Square Tests
Chi-square tests are a family of statistical tests used to analyze categorical data. They are particularly useful for testing hypotheses about the distribution of categorical variables and the relationships between them. The three main types of chi-square tests are the goodness-of-fit test, the test for independence, and the test for homogeneity of proportions.
Goodness-of-Fit Test
Purpose and Overview
The goodness-of-fit test is used to determine whether the observed frequency distribution of a categorical variable matches an expected distribution. This test is commonly applied when comparing sample data to a known or hypothesized population distribution.
Null Hypothesis (H0): The observed distribution matches the expected distribution.
Alternative Hypothesis (H1): The observed distribution does not match the expected distribution.
Characteristics of the Chi-Square Distribution
Not symmetric; skewed right, especially for small degrees of freedom (d.f.).
The shape depends on the degrees of freedom, becoming more symmetric as d.f. increases.
All values are nonnegative (χ² ≥ 0).

Calculating Expected Counts
Suppose there are n independent trials and k mutually exclusive categories. Let pi be the probability of the ith outcome. The expected count for each category is:
All expected counts should be at least 1, and no more than 20% should be less than 5.
Example: Expected Counts Calculation
A sociologist wants to know if the distribution of years grandparents care for grandchildren has changed since 2000. The 2000 distribution is:
Number of Years | Probability |
|---|---|
Less than 1 year | 0.228 |
1 or 2 years | 0.239 |
3 or 4 years | 0.176 |
5 or more years | 0.357 |
For a sample of 1,000 grandparents, the expected counts are:
Less than 1 year:
1 or 2 years:
3 or 4 years:
5 or more years:
Test Statistic for Goodness-of-Fit
The test statistic is calculated as:
where is the observed count and is the expected count for category .
The test statistic approximately follows a chi-square distribution with degrees of freedom.
Steps for Conducting a Goodness-of-Fit Test
State the hypotheses: : The variable follows the specified distribution; : It does not.
Choose a significance level (), e.g., 0.05.
Calculate expected counts and verify requirements.
Compute the test statistic using the formula above.
Determine the critical value from the chi-square table for degrees of freedom.
Compare the test statistic to the critical value (classical approach) or compute the P-value (P-value approach).
State the conclusion based on the comparison.
Critical Value and P-Value Approaches
Critical Value Approach: Reject if (critical value).
P-Value Approach: Reject if P-value .


Worked Example: Goodness-of-Fit Test
Observed counts for the number of years grandparents care for grandchildren:
Number of Years | Frequency |
|---|---|
Less than 1 year | 252 |
1 or 2 years | 255 |
3 or 4 years | 162 |
5 or more years | 331 |

Test statistic: ; Critical value at and 3 d.f.: .

Since , we fail to reject . The P-value () is also greater than .

Conclusion: There is insufficient evidence to conclude that the distribution has changed.
Chi-Square Test for Independence
Purpose and Overview
The chi-square test for independence is used to determine whether two categorical variables are associated (dependent) or not (independent). Data are organized in a contingency table.
Null Hypothesis (H0): The variables are independent.
Alternative Hypothesis (H1): The variables are dependent.
Calculating Expected Counts in Contingency Tables
For a cell in row and column :
Example: Expected Counts Calculation
Money | Health | Love | Row Totals | |
|---|---|---|---|---|
Men | 82 | 446 | 355 | 883 |
Women | 46 | 574 | 273 | 893 |
Column Totals | 128 | 1020 | 628 | 1776 |

Test Statistic for Independence
The test statistic is:
where is the number of rows and is the number of columns. The degrees of freedom are .
Steps for Conducting a Test for Independence
State the hypotheses: : Variables are independent; : Variables are dependent.
Choose a significance level ().
Calculate expected counts and verify requirements.
Compute the test statistic.
Determine the critical value or P-value.
Draw a conclusion.


Worked Example: Test for Independence
Observed and expected counts for the poll:
Money | Health | Love | |
|---|---|---|---|
Men | 82 (63.58) | 446 (507.05) | 355 (312.22) |
Women | 46 (64.29) | 574 (512.91) | 273 (315.77) |
Test statistic: ; Critical value at and 2 d.f.: .
Since , we reject . The P-value is approximately 0 (less than ).
Conclusion: There is sufficient evidence to conclude that gender and response are dependent.
Conditional Distributions and Bar Graphs
Conditional distributions show the relative frequency of each response by gender:
Money | Health | Love | |
|---|---|---|---|
Men | 0.0929 | 0.5051 | 0.4020 |
Women | 0.0515 | 0.6428 | 0.3057 |

Chi-Square Test for Homogeneity of Proportions
Purpose and Overview
The chi-square test for homogeneity of proportions is used to test whether different populations have the same proportion of individuals with a certain characteristic. The procedure is identical to the test for independence, but the context involves comparing populations rather than variables within a single population.
Null Hypothesis (H0): All population proportions are equal.
Alternative Hypothesis (H1): At least one proportion is different.
Worked Example: Test for Homogeneity
1992 | 2002 | 2008 | |
|---|---|---|---|
Yes | 418 (475.554) | 479 (475.554) | 525 (470.892) |
No | 602 (544.446) | 541 (544.446) | 485 (539.108) |
Test statistic: ; Critical value at and 2 d.f.: .
Since , we reject . The P-value is approximately 0 (less than ).
Conclusion: There is sufficient evidence to conclude that the proportion of individuals who believe teaching is a prestigious career differs among the years.
Summary Table: Chi-Square Test Types
Test | Purpose | Data Structure | Degrees of Freedom |
|---|---|---|---|
Goodness-of-Fit | Compare observed to expected distribution | One categorical variable, k categories | k - 1 |
Independence | Test association between two variables | Contingency table (r x c) | (r - 1)(c - 1) |
Homogeneity | Compare proportions across populations | Contingency table (r x c) | (r - 1)(c - 1) |