Decision Trees for Choosing Statistical Tests: Study Guide for Statistics Students

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Decision Trees in Statistics

Introduction to Decision Trees for Statistical Test Selection

Decision trees are systematic tools used to select appropriate statistical tests based on the research question, variable types, and study design. They help clarify the logic behind test selection and ensure that the chosen method matches the data and hypothesis.

Purpose: To guide researchers in choosing the correct statistical test for their data and hypothesis.
Key Questions: Who are the subjects? What are the variables? Why is the test being performed?

Statistical Emergency: The WWW Approach

Who, What, Why

Before selecting a test, clarify the following:

Who: Identify the sampling units and population. Consider how samples were taken and whether the design is paired or unpaired.
What: Define the response variable (y) and predictor(s) (x). Classify variables as quantitative (continuous/discrete) or categorical (nominal, ordinal, binary).
Why: Determine the research question: Are you testing for differences between groups or associations between variables?

Study Design: Paired vs. Unpaired

Between-Subject vs. Within-Subject Designs

Study design affects which statistical test is appropriate.

Not Paired (Between-Subject): Each subject provides one response value. Compare group means.
Paired (Within-Subject): Each subject provides two response values. Analyze individual differences.

Example Table: Paired vs. Unpaired Design

Design	Data Structure	Analysis
Not Paired	One value per subject	Test difference between group means
Paired	Two values per subject	Test individual differences between responses

Types of Variables

Quantitative vs. Categorical

Quantitative:
- Continuous (e.g., height, weight)
- Discrete (e.g., number of children)
Categorical:
- Ordinal (natural ranking, e.g., education level)
- Nominal (no natural order, e.g., blood type)
- Binary (two categories, e.g., Yes/No)

Parametric vs. Non-Parametric Tests

Choosing Based on Data Characteristics

Parametric Tests:
- Based on Central Limit Theorem (CLT)
- Assume normality, no outliers, sufficient sample size
- Examples: t-test, ANOVA
Non-Parametric Tests:
- Do not rely on CLT
- Tolerate skewed distributions and outliers
- Examples: Wilcoxon rank-sum test, Chi-square test

Sample Size Guidelines

Sample Size (n)	Assumptions
n ≥ 15	No deviation from normality
15 < n < 45	No outliers, not strongly skewed
n > 45	No outliers

Directed vs. Undirected Relationships

Types of Hypotheses

Directed: Hypothesize that X affects Y (causal relationship). Example: Diet and risk of cancer.
Undirected: Hypothesize association between Y1 and Y2 (correlation, co-occurrence). Example: Co-occurrence of two plant species.

Variable Type and Test Selection

Matching Predictor and Response Types

Binary Predictor, Quantitative Response: Use t-test.
Categorical Predictor (>2 groups), Quantitative Response: Use ANOVA.
Categorical Predictor, Categorical Response: Use two-way tables (Chi-square test).
Quantitative Predictor, Quantitative Response: Use regression analysis.

Decision Tree for Statistical Test Selection

First Level: Why?

Aim: What is the research question?
Hypothesis Statement: Is it about differences between groups or association between variables?

Decision Tree Table

Aim	H0: No Difference	H0: No Association	Multiple y
Difference between groups	Compare means, distributions, proportions	Association between variables	Other aims (e.g., curve fitting)

Practice Example: QSYMIA Effects and Side Effects

Statistical Questions

Compare body mass in patients before and after one month of taking QSYMIA.
Compare mean weight loss in patients who take QSYMIA to weight loss in control patients.
Compare the proportions of babies born with cleft palates between patients who take QSYMIA and patients not taking this drug.
Test whether weight loss in patients who take QSYMIA depends on age.

Poll Questions: Identifying Variables

Response and Explanatory Variables

Response Variable: Whether the baby has a cleft palate (Yes/No, binary).
Explanatory Variable: Whether the mother was taking QSYMIA (Yes/No, binary).
Experimental Units: Babies.

Key B: Differences in Distributions or Frequencies

Testing Proportions

Use Chi-square tests for differences in proportions or frequencies.
Types: Test of homogeneity, test of independence.

Example Table: Chi-square Test Types

Test Type	Purpose
Test of Homogeneity	Compare proportions across groups
Test of Independence	Assess association between categorical variables

Key C: Association Between Two Variables

Types of Association Tests

Nominal Variables: Use Chi-square test for association.
Ordinal Variables: Use rank correlation (e.g., Spearman's rho).
Quantitative Variables: Use correlation or regression analysis.

Disclaimer: What Is Not Covered in the Tree?

One-sample tests: Mean (t-test, sign test).
Curve fitting: Survival curves, quantile regression.

When to Consult a Statistician

Identify the general type of statistical problem.
If unsure, review worked examples or seek a second opinion.
Answering decision tree questions helps communicate with statisticians and guides further research.

Practice and Review

Solve practice problems using the decision tree.
Participate in quizzes and discussion boards for additional practice.

Key Formulas

t-test for Two Independent Means

Used to compare means between two groups:

Chi-square Test Statistic

Used for categorical data:

Pearson Correlation Coefficient

Measures linear association between two quantitative variables:

Simple Linear Regression Equation

Models relationship between predictor and response:

Summary Table: Matching Test to Scenario

Scenario	Test
Compare means (2 groups, unpaired)	t-test
Compare means (paired)	Paired t-test
Compare proportions (2 groups)	Chi-square test
Association between two quantitative variables	Correlation/Regression
Association between two categorical variables	Chi-square test

Additional info: These notes expand on the decision tree approach, variable classification, and test selection logic, providing context and examples for exam preparation.