Skip to main content
Back

Decision Trees for Choosing Statistical Tests: Study Guide for Statistics Students

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Decision Trees in Statistics

Introduction to Decision Trees for Statistical Test Selection

Decision trees are systematic tools used to select appropriate statistical tests based on the research question, variable types, and study design. They help clarify the logic behind test selection and ensure that the chosen method matches the data and hypothesis.

  • Purpose: To guide researchers in choosing the correct statistical test for their data and hypothesis.

  • Key Questions: Who are the subjects? What are the variables? Why is the test being performed?

Statistical Emergency: The WWW Approach

Who, What, Why

Before selecting a test, clarify the following:

  • Who: Identify the sampling units and population. Consider how samples were taken and whether the design is paired or unpaired.

  • What: Define the response variable (y) and predictor(s) (x). Classify variables as quantitative (continuous/discrete) or categorical (nominal, ordinal, binary).

  • Why: Determine the research question: Are you testing for differences between groups or associations between variables?

Study Design: Paired vs. Unpaired

Between-Subject vs. Within-Subject Designs

Study design affects which statistical test is appropriate.

  • Not Paired (Between-Subject): Each subject provides one response value. Compare group means.

  • Paired (Within-Subject): Each subject provides two response values. Analyze individual differences.

Example Table: Paired vs. Unpaired Design

Design

Data Structure

Analysis

Not Paired

One value per subject

Test difference between group means

Paired

Two values per subject

Test individual differences between responses

Types of Variables

Quantitative vs. Categorical

  • Quantitative:

    • Continuous (e.g., height, weight)

    • Discrete (e.g., number of children)

  • Categorical:

    • Ordinal (natural ranking, e.g., education level)

    • Nominal (no natural order, e.g., blood type)

    • Binary (two categories, e.g., Yes/No)

Parametric vs. Non-Parametric Tests

Choosing Based on Data Characteristics

  • Parametric Tests:

    • Based on Central Limit Theorem (CLT)

    • Assume normality, no outliers, sufficient sample size

    • Examples: t-test, ANOVA

  • Non-Parametric Tests:

    • Do not rely on CLT

    • Tolerate skewed distributions and outliers

    • Examples: Wilcoxon rank-sum test, Chi-square test

Sample Size Guidelines

Sample Size (n)

Assumptions

n ≥ 15

No deviation from normality

15 < n < 45

No outliers, not strongly skewed

n > 45

No outliers

Directed vs. Undirected Relationships

Types of Hypotheses

  • Directed: Hypothesize that X affects Y (causal relationship). Example: Diet and risk of cancer.

  • Undirected: Hypothesize association between Y1 and Y2 (correlation, co-occurrence). Example: Co-occurrence of two plant species.

Variable Type and Test Selection

Matching Predictor and Response Types

  • Binary Predictor, Quantitative Response: Use t-test.

  • Categorical Predictor (>2 groups), Quantitative Response: Use ANOVA.

  • Categorical Predictor, Categorical Response: Use two-way tables (Chi-square test).

  • Quantitative Predictor, Quantitative Response: Use regression analysis.

Decision Tree for Statistical Test Selection

First Level: Why?

  • Aim: What is the research question?

  • Hypothesis Statement: Is it about differences between groups or association between variables?

Decision Tree Table

Aim

H0: No Difference

H0: No Association

Multiple y

Difference between groups

Compare means, distributions, proportions

Association between variables

Other aims (e.g., curve fitting)

Practice Example: QSYMIA Effects and Side Effects

Statistical Questions

  • Compare body mass in patients before and after one month of taking QSYMIA.

  • Compare mean weight loss in patients who take QSYMIA to weight loss in control patients.

  • Compare the proportions of babies born with cleft palates between patients who take QSYMIA and patients not taking this drug.

  • Test whether weight loss in patients who take QSYMIA depends on age.

Poll Questions: Identifying Variables

Response and Explanatory Variables

  • Response Variable: Whether the baby has a cleft palate (Yes/No, binary).

  • Explanatory Variable: Whether the mother was taking QSYMIA (Yes/No, binary).

  • Experimental Units: Babies.

Key B: Differences in Distributions or Frequencies

Testing Proportions

  • Use Chi-square tests for differences in proportions or frequencies.

  • Types: Test of homogeneity, test of independence.

Example Table: Chi-square Test Types

Test Type

Purpose

Test of Homogeneity

Compare proportions across groups

Test of Independence

Assess association between categorical variables

Key C: Association Between Two Variables

Types of Association Tests

  • Nominal Variables: Use Chi-square test for association.

  • Ordinal Variables: Use rank correlation (e.g., Spearman's rho).

  • Quantitative Variables: Use correlation or regression analysis.

Disclaimer: What Is Not Covered in the Tree?

  • One-sample tests: Mean (t-test, sign test).

  • Curve fitting: Survival curves, quantile regression.

When to Consult a Statistician

  • Identify the general type of statistical problem.

  • If unsure, review worked examples or seek a second opinion.

  • Answering decision tree questions helps communicate with statisticians and guides further research.

Practice and Review

  • Solve practice problems using the decision tree.

  • Participate in quizzes and discussion boards for additional practice.

Key Formulas

t-test for Two Independent Means

Used to compare means between two groups:

Chi-square Test Statistic

Used for categorical data:

Pearson Correlation Coefficient

Measures linear association between two quantitative variables:

Simple Linear Regression Equation

Models relationship between predictor and response:

Summary Table: Matching Test to Scenario

Scenario

Test

Compare means (2 groups, unpaired)

t-test

Compare means (paired)

Paired t-test

Compare proportions (2 groups)

Chi-square test

Association between two quantitative variables

Correlation/Regression

Association between two categorical variables

Chi-square test

Additional info: These notes expand on the decision tree approach, variable classification, and test selection logic, providing context and examples for exam preparation.

Pearson Logo

Study Prep