BackApplied Statistics for the Health Sciences: Associations Between Categorical Variables and the Chi-Square Test
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Applied Statistics for the Health Sciences
Course Introduction
This course provides an overview of statistical methods applied in health sciences, focusing on the analysis of categorical data and the use of the Chi-Square test for independence. Students will learn how to identify associations between categorical variables, construct and interpret contingency tables, and apply statistical tests using software such as SPSS.
Overview of the Course
Main Topics
Descriptive statistics
Probability
Statistical inference
Associations between categorical variables
Chi-Square test of independence ()
Contingency tables
Assumptions
SPSS demonstrations
Associations Between Categorical Variables
Definition and Examples
Categorical (qualitative) variables are variables that represent categories or groups, such as gender (male/female), blood group (A/B/AB/O), or survival status (survived/not survived). Associations between categorical variables are studied to determine if the distribution of one variable differs across the categories of another variable.
Nominal variables: Categories without intrinsic order (e.g., gender, blood group).
Ordinal variables: Categories with a logical order (e.g., severity: mild, moderate, severe).
Example: Investigating whether survival rates differ by gender in a health study.
Contingency Tables
Purpose and Structure
A contingency table is a matrix that displays the frequency distribution of variables. It is used to examine the relationship between two categorical variables.
Gender | Survived | Not Survived | Total |
|---|---|---|---|
Male | Observed frequency | Observed frequency | Row total |
Female | Observed frequency | Observed frequency | Row total |
Total | Column total | Column total | Grand total |
Additional info: The table above is a general format; actual frequencies are filled in with study data.
Chi-Square Test of Independence ()
Definition and Application
The Chi-Square test of independence is used to determine whether there is a significant association between two categorical variables. It compares the observed frequencies in each category to the frequencies expected if there were no association.
Null hypothesis (): The variables are independent (no association).
Alternative hypothesis (): The variables are associated.
Formula
The test statistic is calculated as:
Where:
= Observed frequency
= Expected frequency (if is true)
Calculating Expected Frequencies
Expected frequency for each cell is calculated as:
Chi-Square Distribution
The calculated statistic is compared to a critical value from the Chi-Square distribution table, based on the degrees of freedom:
Where is the number of rows and is the number of columns in the contingency table.
Assumptions
Observations are independent.
Categories are mutually exclusive.
Expected frequencies in each cell should be at least 5. If more than 20% of cells have expected frequencies less than 5, the test may not be valid.
Effect Size
Effect size for the Chi-Square test can be measured using Cramér's V:
Where is the total sample size and is the smaller number of rows or columns.
Small effect:
Moderate effect:
Large effect:
Interpretation
If the calculated statistic exceeds the critical value (or if the p-value is less than the significance level, typically 0.05), we reject the null hypothesis and conclude that there is a significant association between the variables.
Example: There is a significant association between gender and crash involvement (, ). Men are more likely to be involved in a crash than women (30% vs. 20%).
SPSS Demonstrations
Application in Statistical Software
SPSS is commonly used to perform the Chi-Square test and generate contingency tables. The software provides output tables with observed and expected frequencies, test statistics, p-values, and effect sizes.
Input data into SPSS as categorical variables.
Use the Crosstabs function to generate contingency tables and run the Chi-Square test.
Interpret the output based on the guidelines above.
Assessment Overview
Midterm Exam
Consists of multiple-choice and numeric answer questions.
Open book and notes allowed.
Duration: 2 hours, with a 3-hour window to start.
Students must submit answers before the window closes.
Group Project
Groups of 4 students (or 2 groups of 3 students).
Send group member names to the instructor.
Presentation of the group project in class (date specified by instructor).
Additional info: The notes include references to Covid-19 data and crash involvement as examples of categorical variable analysis.