13. Chi-Square Tests & Goodness of Fit
Contingency Tables
1:36 minutes
Problem 10.2.34
Textbook Question
Contingency Tables and Relative Frequencies In Exercises 33–36, use the information below.
The frequencies in a contingency table can be written as relative frequencies by dividing each frequency by the sample size. The contingency table below shows the number of U.S. adults (in millions) ages 25 and over by employment status and educational attainment. (Adapted from U.S. Census Bureau)
Explain why you cannot perform the chi-square independence test on these data.
Verified step by step guidance

1
1
Step 1: Understand the chi-square independence test. This test is used to determine whether there is a significant association between two categorical variables. It requires raw frequency counts, not relative frequencies, as input data.
Step 2: Analyze the contingency table provided. The table shows the number of U.S. adults (in millions) categorized by employment status and educational attainment. These values are raw frequencies, not relative frequencies.
Step 3: Consider the requirement for the chi-square test. One key assumption is that the expected frequency in each cell of the table must be at least 5. If any cell has an expected frequency less than 5, the test cannot be performed reliably.
Step 4: Examine the data in the table. Some cells, such as 'Unemployed' for 'Not a high school graduate' (0.8 million) and 'Unemployed' for 'Some college, no degree' (1.1 million), have frequencies less than 5. This violates the assumption of the chi-square test.
Step 5: Conclude that the chi-square independence test cannot be performed on these data because some cells have frequencies less than 5, which makes the test invalid under its assumptions.
Key Concepts
Here are the essential concepts you must grasp in order to answer the question correctly.
Contingency Tables
A contingency table is a type of data representation that displays the frequency distribution of variables. It allows for the examination of the relationship between two categorical variables by showing how the frequencies of one variable are distributed across the categories of another. In this case, the table illustrates the employment status of U.S. adults based on their educational attainment.
Relative Frequencies
Relative frequencies are calculated by dividing the frequency of a specific category by the total number of observations, providing a proportion that reflects the size of that category relative to the whole. This transformation is useful for comparing categories on a common scale, especially when sample sizes differ. In the context of the contingency table, relative frequencies would help in understanding the distribution of employment status across educational levels.
Chi-Square Independence Test
The chi-square independence test is a statistical method used to determine if there is a significant association between two categorical variables. However, this test requires that the expected frequency in each cell of the contingency table be sufficiently large (typically at least 5). If any expected frequencies are too low, the test may not be valid, which is likely the case with the provided data, as some categories may have very few observations.
