Skip to main content
Back

Applied Statistics for the Health Sciences: Associations Between Categorical Variables and the Chi-Square Test

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Applied Statistics for the Health Sciences

Course Introduction

This course provides an overview of statistical methods applied in health sciences, focusing on the analysis of categorical data and the use of the Chi-Square test for independence. Students will learn how to identify associations between categorical variables, construct and interpret contingency tables, and apply statistical tests using software such as SPSS.

Overview of the Course

Main Topics

  • Descriptive statistics

  • Probability

  • Statistical inference

  • Associations between categorical variables

  • Chi-Square test of independence ()

  • Contingency tables

  • Assumptions

  • SPSS demonstrations

Associations Between Categorical Variables

Definition and Examples

Categorical (qualitative) variables are variables that represent categories or groups, such as gender (male/female), blood group (A/B/AB/O), or survival status (survived/not survived). Associations between categorical variables are studied to determine if the distribution of one variable differs across the categories of another variable.

  • Nominal variables: Categories without intrinsic order (e.g., gender, blood group).

  • Ordinal variables: Categories with a logical order (e.g., severity: mild, moderate, severe).

  • Example: Investigating whether survival rates differ by gender in a health study.

Contingency Tables

Purpose and Structure

A contingency table is a matrix that displays the frequency distribution of variables. It is used to examine the relationship between two categorical variables.

Gender

Survived

Not Survived

Total

Male

Observed frequency

Observed frequency

Row total

Female

Observed frequency

Observed frequency

Row total

Total

Column total

Column total

Grand total

Additional info: The table above is a general format; actual frequencies are filled in with study data.

Chi-Square Test of Independence ()

Definition and Application

The Chi-Square test of independence is used to determine whether there is a significant association between two categorical variables. It compares the observed frequencies in each category to the frequencies expected if there were no association.

  • Null hypothesis (): The variables are independent (no association).

  • Alternative hypothesis (): The variables are associated.

Formula

The test statistic is calculated as:

Where:

  • = Observed frequency

  • = Expected frequency (if is true)

Calculating Expected Frequencies

Expected frequency for each cell is calculated as:

Chi-Square Distribution

The calculated statistic is compared to a critical value from the Chi-Square distribution table, based on the degrees of freedom:

Where is the number of rows and is the number of columns in the contingency table.

Assumptions

  • Observations are independent.

  • Categories are mutually exclusive.

  • Expected frequencies in each cell should be at least 5. If more than 20% of cells have expected frequencies less than 5, the test may not be valid.

Effect Size

Effect size for the Chi-Square test can be measured using Cramér's V:

Where is the total sample size and is the smaller number of rows or columns.

  • Small effect:

  • Moderate effect:

  • Large effect:

Interpretation

If the calculated statistic exceeds the critical value (or if the p-value is less than the significance level, typically 0.05), we reject the null hypothesis and conclude that there is a significant association between the variables.

  • Example: There is a significant association between gender and crash involvement (, ). Men are more likely to be involved in a crash than women (30% vs. 20%).

SPSS Demonstrations

Application in Statistical Software

SPSS is commonly used to perform the Chi-Square test and generate contingency tables. The software provides output tables with observed and expected frequencies, test statistics, p-values, and effect sizes.

  • Input data into SPSS as categorical variables.

  • Use the Crosstabs function to generate contingency tables and run the Chi-Square test.

  • Interpret the output based on the guidelines above.

Assessment Overview

Midterm Exam

  • Consists of multiple-choice and numeric answer questions.

  • Open book and notes allowed.

  • Duration: 2 hours, with a 3-hour window to start.

  • Students must submit answers before the window closes.

Group Project

  • Groups of 4 students (or 2 groups of 3 students).

  • Send group member names to the instructor.

  • Presentation of the group project in class (date specified by instructor).

Additional info: The notes include references to Covid-19 data and crash involvement as examples of categorical variable analysis.

Pearson Logo

Study Prep