BackDecision Trees for Choosing Statistical Tests: Study Guide for Statistics Students
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Decision Trees in Statistics
Introduction to Decision Trees for Statistical Test Selection
Decision trees are systematic tools used to select appropriate statistical tests based on the research question, variable types, and study design. They help clarify the logic behind test selection and ensure that the chosen method matches the data and hypothesis.
Purpose: To guide researchers in choosing the correct statistical test for their data and hypothesis.
Key Questions: Who are the subjects? What are the variables? Why is the test being performed?
Statistical Emergency: The WWW Approach
Who, What, Why
Before selecting a test, clarify the following:
Who: Identify the sampling units and population. Consider how samples were taken and whether the design is paired or unpaired.
What: Define the response variable (y) and predictor(s) (x). Classify variables as quantitative (continuous/discrete) or categorical (nominal, ordinal, binary).
Why: Determine the research question: Are you testing for differences between groups or associations between variables?
Study Design: Paired vs. Unpaired
Between-Subject vs. Within-Subject Designs
Study design affects which statistical test is appropriate.
Not Paired (Between-Subject): Each subject provides one response value. Compare group means.
Paired (Within-Subject): Each subject provides two response values. Analyze individual differences.
Example Table: Paired vs. Unpaired Design
Design | Data Structure | Analysis |
|---|---|---|
Not Paired | One value per subject | Test difference between group means |
Paired | Two values per subject | Test individual differences between responses |
Types of Variables
Quantitative vs. Categorical
Quantitative:
Continuous (e.g., height, weight)
Discrete (e.g., number of children)
Categorical:
Ordinal (natural ranking, e.g., education level)
Nominal (no natural order, e.g., blood type)
Binary (two categories, e.g., Yes/No)
Parametric vs. Non-Parametric Tests
Choosing Based on Data Characteristics
Parametric Tests:
Based on Central Limit Theorem (CLT)
Assume normality, no outliers, sufficient sample size
Examples: t-test, ANOVA
Non-Parametric Tests:
Do not rely on CLT
Tolerate skewed distributions and outliers
Examples: Wilcoxon rank-sum test, Chi-square test
Sample Size Guidelines
Sample Size (n) | Assumptions |
|---|---|
n ≥ 15 | No deviation from normality |
15 < n < 45 | No outliers, not strongly skewed |
n > 45 | No outliers |
Directed vs. Undirected Relationships
Types of Hypotheses
Directed: Hypothesize that X affects Y (causal relationship). Example: Diet and risk of cancer.
Undirected: Hypothesize association between Y1 and Y2 (correlation, co-occurrence). Example: Co-occurrence of two plant species.
Variable Type and Test Selection
Matching Predictor and Response Types
Binary Predictor, Quantitative Response: Use t-test.
Categorical Predictor (>2 groups), Quantitative Response: Use ANOVA.
Categorical Predictor, Categorical Response: Use two-way tables (Chi-square test).
Quantitative Predictor, Quantitative Response: Use regression analysis.
Decision Tree for Statistical Test Selection
First Level: Why?
Aim: What is the research question?
Hypothesis Statement: Is it about differences between groups or association between variables?
Decision Tree Table
Aim | H0: No Difference | H0: No Association | Multiple y |
|---|---|---|---|
Difference between groups | Compare means, distributions, proportions | Association between variables | Other aims (e.g., curve fitting) |
Practice Example: QSYMIA Effects and Side Effects
Statistical Questions
Compare body mass in patients before and after one month of taking QSYMIA.
Compare mean weight loss in patients who take QSYMIA to weight loss in control patients.
Compare the proportions of babies born with cleft palates between patients who take QSYMIA and patients not taking this drug.
Test whether weight loss in patients who take QSYMIA depends on age.
Poll Questions: Identifying Variables
Response and Explanatory Variables
Response Variable: Whether the baby has a cleft palate (Yes/No, binary).
Explanatory Variable: Whether the mother was taking QSYMIA (Yes/No, binary).
Experimental Units: Babies.
Key B: Differences in Distributions or Frequencies
Testing Proportions
Use Chi-square tests for differences in proportions or frequencies.
Types: Test of homogeneity, test of independence.
Example Table: Chi-square Test Types
Test Type | Purpose |
|---|---|
Test of Homogeneity | Compare proportions across groups |
Test of Independence | Assess association between categorical variables |
Key C: Association Between Two Variables
Types of Association Tests
Nominal Variables: Use Chi-square test for association.
Ordinal Variables: Use rank correlation (e.g., Spearman's rho).
Quantitative Variables: Use correlation or regression analysis.
Disclaimer: What Is Not Covered in the Tree?
One-sample tests: Mean (t-test, sign test).
Curve fitting: Survival curves, quantile regression.
When to Consult a Statistician
Identify the general type of statistical problem.
If unsure, review worked examples or seek a second opinion.
Answering decision tree questions helps communicate with statisticians and guides further research.
Practice and Review
Solve practice problems using the decision tree.
Participate in quizzes and discussion boards for additional practice.
Key Formulas
t-test for Two Independent Means
Used to compare means between two groups:
Chi-square Test Statistic
Used for categorical data:
Pearson Correlation Coefficient
Measures linear association between two quantitative variables:
Simple Linear Regression Equation
Models relationship between predictor and response:
Summary Table: Matching Test to Scenario
Scenario | Test |
|---|---|
Compare means (2 groups, unpaired) | t-test |
Compare means (paired) | Paired t-test |
Compare proportions (2 groups) | Chi-square test |
Association between two quantitative variables | Correlation/Regression |
Association between two categorical variables | Chi-square test |
Additional info: These notes expand on the decision tree approach, variable classification, and test selection logic, providing context and examples for exam preparation.