BackRegression & ANOVA Wisdom: Distribution, Transformations, and Influential Points
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Regression & ANOVA Wisdom
Introduction
This study guide covers advanced topics in regression and Analysis of Variance (ANOVA), focusing on the importance of data distribution, the use of transformations, handling multiple comparisons, checking assumptions, and identifying influential data points. These concepts are essential for accurate statistical inference in biological and other quantitative research.
ANOVA: Accounting for Multiple Tests
Why Not Perform Multiple t-Tests?
Each t-test has a Type I error rate (commonly α = 0.05).
Performing many tests increases the probability of at least one Type I error.
Tests may not be independent, further complicating error rates.
Options for Multiple Comparisons
Bonferroni Correction: Adjusts α by dividing by the number of tests. Simple but can be overly conservative, reducing statistical power.
Tukey's Honestly Significant Difference (HSD): Preferred for pairwise comparisons in ANOVA. Accounts for non-independence and allows specification of family-wise confidence level.
False Discovery Rate (FDR): Used outside ANOVA for many tests. Less conservative than Bonferroni. In R: p.adjust(p, method="fdr")
ANOVA in a Nutshell
Key Concepts
Variance Comparison: ANOVA compares variance between groups to variance within groups.
F-statistic:
Hypotheses:
: All means are the same
: At least one mean is different
Checking Conditions:
Are variances similar in all groups?
Are residuals nearly normally distributed?
If Valid and Significant:
Calculate effect size (Eta squared, )
Determine which means differ
ANOVA Assumptions & Remedies
Assumptions of Standard ANOVA
Assumption | Complications & Remedies |
|---|---|
Nearly normal distribution within groups | Skewed distributions: use transformation |
Equal variance between groups | Unequal variances: use transformation |
No outliers | Many outliers: use non-parametric test (e.g., Kruskal-Wallis) |
Same sample size in groups (balanced design) | Unbalanced design: reduced robustness |
Independent samples (SRS or randomized experiment) | Dependent samples: use repeated measures ANOVA |
Transformations for Biological Data
Purpose and Types
Transformations are used to normalize distributions, stabilize variances, and meet ANOVA assumptions. Common transformations include:
Transformation | Condition | Formula | Main Application |
|---|---|---|---|
Logarithm | or | Amounts, Concentrations | |
Square root | Counts | ||
Arc-sine square-root | Proportions | ||
Inverse | Ratios | ||
Double logarithm | Power laws |
Example: Log Transformation
Original data may be skewed or have unequal variances.
Log transformation can normalize data and stabilize variance, making ANOVA assumptions more valid.
Residual Plots and Checking Assumptions
Using Residual Plots
Residual plots help assess normality and variance homogeneity.
Q-Q plots and histograms of residuals are used to check for normality.
Plots of residuals vs. fitted values check for constant variance.
Influential points can be identified visually.
Example: Logit Transformation
Transformed response (e.g., logit) can improve normality and variance consistency.
Reporting ANOVA Results
Methods and Results
Describe the statistical test used, including any transformations.
Report F-statistic, degrees of freedom, p-value, and effect size ().
Note any complications (e.g., unbalanced design, skewed residuals).
If non-parametric tests are used, report those results as well.
Example:
"One-way ANOVA of log-transformed latency of mating (in seconds) showed a significant effect of type of mating pair (, p-value < 0.001, )."
Influential Points in Regression and ANOVA
Definition and Identification
High leverage points: Points far from the mean of the predictor variable; can strongly affect the slope of regression.
Large residuals: Points far above or below the regression line; reduce .
Influential points: Points that, if omitted, change regression coefficients or residuals significantly. Identified by Cook's distance ().
Should Outliers Be Removed?
Requires biological or scientific justification.
Report results with and without outlier.
Beware of Confounding Variables
Splitting Data When Relationships Differ
If the relationship between variables differs between groups, analyze groups separately.
Confounding variables can obscure true relationships and lead to incorrect conclusions.
Packing Your Stats Survival Kit
Essential Tools for Statistical Analysis
Decision trees and flow charts for choosing appropriate tests.
Assignment outlines and rubrics for report writing, paper critique, and research planning.
Crib sheets summarizing key formulas and concepts.
Confidence in your statistical skills and pride in your accomplishments.
Summary Table: Common Transformations in Biology
Transformation | Condition | Formula | Main Application |
|---|---|---|---|
Logarithm | or | Amounts, Concentrations | |
Square root | Counts | ||
Arc-sine square-root | Proportions | ||
Inverse | Ratios | ||
Double logarithm | Power laws |
Additional info:
Some slides referenced biological applications (e.g., attractiveness of flies), but the statistical principles apply broadly to quantitative data analysis.
Transformations and residual analysis are critical for meeting ANOVA assumptions and ensuring valid inference.
Influential points and confounding variables must be carefully considered in regression and ANOVA to avoid misleading results.