BackAssociation Between Two Categorical Variables: Contingency Tables and Proportions
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 3: Association – Contingency, Correlation, and Regression
Section 3.1: Exploring the Association Between Two Categorical Variables
This section introduces foundational concepts for analyzing the association between two categorical variables in statistics. It covers variable types, the definition of association, contingency tables, and the calculation of proportions and conditional proportions.
Learning Objectives
Identify variable type: Response or Explanatory
Define Association
Understand and construct Contingency Tables
Calculate Proportions and Conditional Proportions
Response and Explanatory Variables
In statistical analysis, it is crucial to distinguish between the response and explanatory variables when comparing groups or outcomes.
Response variable (Dependent Variable): The outcome variable on which comparisons are made.
Explanatory variable (Independent Variable): The variable that defines the groups to be compared with respect to values on the response variable.
Examples:
Blood alcohol level (response) / Number of beers consumed (explanatory)
Grade on test (response) / Amount of study time (explanatory)
Yield of corn per bushel (response) / Amount of rainfall (explanatory)
Association
The main purpose of data analysis with two variables is to investigate whether there is an association and to describe that association.
Association: An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable.
Detecting association helps in understanding relationships and dependencies between variables.
Contingency Tables
A contingency table is a tabular display that shows the frequency distribution of variables. It is a fundamental tool for summarizing the relationship between two categorical variables.
Displays two categorical variables.
The rows list the categories of one variable.
The columns list the categories of the other variable.
Entries in the table are frequencies (counts).
Example Table: Frequencies for Food Type and Pesticide Status
Food Type | Pesticide Present | Pesticide Not Present | Total |
|---|---|---|---|
Organic | 29 | 98 | 127 |
Conventional | 19,485 | 7,086 | 26,571 |
Total | 19,514 | 7,184 | 26,698 |
Key Questions:
What is the response variable? Pesticide Status
What is the explanatory variable? Food Type
Calculating Proportions and Conditional Proportions
Proportions and conditional proportions are used to summarize the data in contingency tables and to compare groups.
Proportion: The fraction of items in a category out of the total number of items.
Conditional Proportion: The proportion of items in a category, given a specific value of another variable.
Example Calculations:
Proportion of organic foods containing pesticides:
Proportion of conventional foods containing pesticides:
Proportion of all sampled items containing pesticide residuals:
Conditional proportions allow for comparison between groups, such as organic vs. conventional foods, with respect to pesticide presence.
Visualizing Conditional Proportions
Side-by-side bar charts are commonly used to display conditional proportions for easy comparison of the explanatory variable with respect to the response variable.
If there is no association between the variables, the proportions for the response variable categories will be the same for each food type.
Summary
The value of the response variable (outcome) depends on the value of an explanatory variable.
For two categorical variables, association is summarized using a contingency table and proportions.
Comparing conditional proportions helps to identify and describe associations between categorical variables.