A goodness of fit test is a statistical method used to determine if the observed frequencies of a dataset align with the expected frequencies based on a specific distribution. This test is particularly useful when assessing whether a die, for example, is fair by comparing the actual outcomes of rolls to the theoretical outcomes expected from a uniform distribution.
In conducting a goodness of fit test, the first step involves formulating the hypotheses. The null hypothesis (H0) posits that the observed frequencies match the expected frequencies, indicating that the distribution fits well. Conversely, the alternative hypothesis (Ha) suggests that at least one observed frequency differs from the expected frequency, implying a poor fit.
To illustrate, consider rolling a six-sided die 60 times. The expected frequency for each face of the die, assuming fairness, would be 10 (calculated as the total number of rolls divided by the number of categories: E = n/k, where n is the total rolls and k is the number of categories). The observed frequencies are the actual counts recorded from the rolls.
The test statistic for a goodness of fit test is calculated using the chi-squared statistic, represented as:
\[\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\]
where Oi is the observed frequency and Ei is the expected frequency for each category. This formula quantifies the discrepancy between observed and expected frequencies across all categories.
After calculating the chi-squared statistic, the next step is to determine the p-value, which indicates the probability of observing the data if the null hypothesis is true. The degrees of freedom for this test are calculated as k - 1, where k is the number of categories. For our die example, with six categories, the degrees of freedom would be 5.
Using statistical tables or software, one can find the p-value corresponding to the calculated chi-squared statistic. If the p-value is less than the predetermined significance level (commonly set at α = 0.05), the null hypothesis is rejected, suggesting that the observed frequencies do not fit the expected distribution well. In our example, if the p-value is 0.0476, which is less than 0.05, we would reject the null hypothesis, concluding that the die is likely not fair.
When performing a goodness of fit test, it is essential to ensure that the sample is random, that all categories have observed frequencies, and that the expected frequencies are sufficiently large (typically at least 5) to validate the test's assumptions. Meeting these criteria ensures the reliability of the test results.