Skip to main content
Back

Regression & ANOVA Wisdom: Distribution, Transformations, and Influential Points

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Regression & ANOVA Wisdom

Introduction

This study guide covers advanced topics in regression and Analysis of Variance (ANOVA), focusing on the importance of data distribution, the use of transformations, handling multiple comparisons, checking assumptions, and identifying influential data points. These concepts are essential for accurate statistical inference in biological and other quantitative research.

ANOVA: Accounting for Multiple Tests

Why Not Perform Multiple t-Tests?

  • Each t-test has a Type I error rate (commonly α = 0.05).

  • Performing many tests increases the probability of at least one Type I error.

  • Tests may not be independent, further complicating error rates.

Options for Multiple Comparisons

  • Bonferroni Correction: Adjusts α by dividing by the number of tests. Simple but can be overly conservative, reducing statistical power.

  • Tukey's Honestly Significant Difference (HSD): Preferred for pairwise comparisons in ANOVA. Accounts for non-independence and allows specification of family-wise confidence level.

  • False Discovery Rate (FDR): Used outside ANOVA for many tests. Less conservative than Bonferroni. In R: p.adjust(p, method="fdr")

ANOVA in a Nutshell

Key Concepts

  • Variance Comparison: ANOVA compares variance between groups to variance within groups.

  • F-statistic:

  • Hypotheses:

    • : All means are the same

    • : At least one mean is different

  • Checking Conditions:

    • Are variances similar in all groups?

    • Are residuals nearly normally distributed?

  • If Valid and Significant:

    • Calculate effect size (Eta squared, )

    • Determine which means differ

ANOVA Assumptions & Remedies

Assumptions of Standard ANOVA

Assumption

Complications & Remedies

Nearly normal distribution within groups

Skewed distributions: use transformation

Equal variance between groups

Unequal variances: use transformation

No outliers

Many outliers: use non-parametric test (e.g., Kruskal-Wallis)

Same sample size in groups (balanced design)

Unbalanced design: reduced robustness

Independent samples (SRS or randomized experiment)

Dependent samples: use repeated measures ANOVA

Transformations for Biological Data

Purpose and Types

Transformations are used to normalize distributions, stabilize variances, and meet ANOVA assumptions. Common transformations include:

Transformation

Condition

Formula

Main Application

Logarithm

or

Amounts, Concentrations

Square root

Counts

Arc-sine square-root

Proportions

Inverse

Ratios

Double logarithm

Power laws

Example: Log Transformation

  • Original data may be skewed or have unequal variances.

  • Log transformation can normalize data and stabilize variance, making ANOVA assumptions more valid.

Residual Plots and Checking Assumptions

Using Residual Plots

  • Residual plots help assess normality and variance homogeneity.

  • Q-Q plots and histograms of residuals are used to check for normality.

  • Plots of residuals vs. fitted values check for constant variance.

  • Influential points can be identified visually.

Example: Logit Transformation

  • Transformed response (e.g., logit) can improve normality and variance consistency.

Reporting ANOVA Results

Methods and Results

  • Describe the statistical test used, including any transformations.

  • Report F-statistic, degrees of freedom, p-value, and effect size ().

  • Note any complications (e.g., unbalanced design, skewed residuals).

  • If non-parametric tests are used, report those results as well.

Example:

  • "One-way ANOVA of log-transformed latency of mating (in seconds) showed a significant effect of type of mating pair (, p-value < 0.001, )."

Influential Points in Regression and ANOVA

Definition and Identification

  • High leverage points: Points far from the mean of the predictor variable; can strongly affect the slope of regression.

  • Large residuals: Points far above or below the regression line; reduce .

  • Influential points: Points that, if omitted, change regression coefficients or residuals significantly. Identified by Cook's distance ().

Should Outliers Be Removed?

  • Requires biological or scientific justification.

  • Report results with and without outlier.

Beware of Confounding Variables

Splitting Data When Relationships Differ

  • If the relationship between variables differs between groups, analyze groups separately.

  • Confounding variables can obscure true relationships and lead to incorrect conclusions.

Packing Your Stats Survival Kit

Essential Tools for Statistical Analysis

  • Decision trees and flow charts for choosing appropriate tests.

  • Assignment outlines and rubrics for report writing, paper critique, and research planning.

  • Crib sheets summarizing key formulas and concepts.

  • Confidence in your statistical skills and pride in your accomplishments.

Summary Table: Common Transformations in Biology

Transformation

Condition

Formula

Main Application

Logarithm

or

Amounts, Concentrations

Square root

Counts

Arc-sine square-root

Proportions

Inverse

Ratios

Double logarithm

Power laws

Additional info:

  • Some slides referenced biological applications (e.g., attractiveness of flies), but the statistical principles apply broadly to quantitative data analysis.

  • Transformations and residual analysis are critical for meeting ANOVA assumptions and ensuring valid inference.

  • Influential points and confounding variables must be carefully considered in regression and ANOVA to avoid misleading results.

Pearson Logo

Study Prep