Skip to main content
Back

Chapter 11-A

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Comparing Several Means: One-Way Analysis of Variance (ANOVA)

Introduction to ANOVA

One-way Analysis of Variance (ANOVA) is a statistical method used to compare the means of three or more independent groups to determine if at least one group mean is significantly different from the others. It is commonly applied when the independent variable is categorical and the dependent variable is numerical.

  • Purpose: To test for differences among group means.

  • Example: Comparing the number of popcorn kernels popped using different types of oil.

Key Concepts and Learning Objectives

  • Understand the relationship between categorical and numerical variables.

  • Test hypotheses for more than two means.

  • Distinguish between the significance level of a single test and the familywise error rate in multiple comparisons.

  • Learn the ANOVA method, its conditions, and the F-statistic.

  • Calculate and interpret ANOVA components.

  • Understand the probability distribution of the F-statistic.

  • Interpret the F-statistic and p-value in hypothesis testing.

  • Perform post-hoc tests to identify which means differ when appropriate.

Section 11.1: General Principles of ANOVA

What is ANOVA?

Analysis of Variance (ANOVA) is a procedure for comparing means from more than two populations. One-way ANOVA focuses on a single numerical variable and a single categorical variable that distinguishes the populations.

  • It is a hypothesis test following the standard four steps: hypothesize, prepare, compute, and interpret.

Data Structure for ANOVA

  • Variables: One categorical (e.g., type of oil), one numerical (e.g., number of kernels popped).

  • Example: Data collected for three levels of oil (None, Medium, Maximum) with summary statistics and boxplots.

Group

Mean (SD)

None

19.75 (11.76)

Medium

17.95 (11.31)

Maximum

13.47 (9.36)

Limitations of Multiple t-Tests

  • Comparing all groups with two-sample t-tests increases the risk of Type I error (false positives).

  • For three groups, three pairwise comparisons are needed, each with its own hypothesis.

Multiple Comparisons and Familywise Error Rate

  • Multiple Comparisons: Testing all pairs increases the chance of incorrectly rejecting at least one true null hypothesis.

  • Familywise Error Rate: The probability of making at least one Type I error across all comparisons. This rate increases with the number of groups.

  • ANOVA controls the familywise error rate by testing all means simultaneously.

Hypotheses in ANOVA

  • In terms of variables:

    • Null hypothesis (): No relationship between the categorical and numerical variable.

    • Alternative hypothesis (): At least one group mean differs.

  • In terms of means:

    • : At least one is not equal to another.

Conditions for Valid ANOVA

  • Random Samples: Each group is a random sample from its population.

  • Independent Groups: Samples are independent of each other.

  • Equal Variance: Variances are approximately equal across groups. (Largest SD / Smallest SD )

  • Normality: The numerical variable is approximately normally distributed in each group, or sample size is large (at least 25 per group).

Understanding Variation in ANOVA

Visualizing ANOVA

  • ANOVA compares the variation between group means to the variation within groups.

  • If between-group variation is much larger than within-group variation, group means are likely different.

Explained vs. Unexplained Variation

  • Explained Variation: Variation due to differences between group means.

  • Unexplained Variation: Variation within groups (random error).

  • Total Variation: Sum of explained and unexplained variation.

Formula:

Calculating Sums of Squares

  • Total Sum of Squares (SST): Measures total variation from the grand mean.

  • Sum of Squares Between (SSB): Measures variation between group means and the grand mean.

  • Sum of Squares Within (SSW): Measures variation within each group.

Degrees of Freedom and Mean Squares

  • Degrees of Freedom (df):

  • Mean Squares (MS): Sums of squares divided by their respective degrees of freedom.

The F-Statistic

  • The F-statistic is the ratio of the mean square between groups to the mean square within groups.

  • If is true, (variation between groups is similar to within groups).

  • If is false, will be much larger than 1 (variation between groups exceeds within groups).

ANOVA Table Structure

The ANOVA table summarizes the results of the analysis, including sums of squares, degrees of freedom, mean squares, the F-statistic, and the p-value.

Source

df

SS

MS

F

p-Value

Between

k-1

SSB

MSB

Calculated

Within

N-k

SSW

MSW

Total

N-1

SST

Example: ANOVA Table for AQI Readings

Source

df

SS

MS

F

p-Value

Between

2

2825.252

1412.626

4.946

0.027

Within

12

3427.200

285.600

Total

14

6252.452

Interpreting the F-Statistic and p-Value

  • If the p-value is less than the significance level (e.g., 0.05), reject and conclude that at least one group mean is different.

  • If the p-value is greater, do not reject ; there is not enough evidence to say the means differ.

Steps in Hypothesis Testing with ANOVA

  1. Hypothesize: State and .

  2. Prepare: Check conditions (random samples, independence, equal variance, normality).

  3. Compute: Calculate sums of squares, mean squares, F-statistic, and complete the ANOVA table.

  4. Interpret: Compare the p-value to the significance level and draw a conclusion.

Probability Distribution of the F-Statistic

  • The F-statistic follows an F-distribution under the null hypothesis, with degrees of freedom (numerator) and (denominator).

  • The F-distribution is right-skewed and only takes positive values.

Summary Table: ANOVA Components

Component

Formula

Description

Total Sum of Squares (SST)

Total variation from the grand mean

Sum of Squares Between (SSB)

Variation between group means

Sum of Squares Within (SSW)

Variation within groups

Mean Square Between (MSB)

SSB divided by its degrees of freedom

Mean Square Within (MSW)

SSW divided by its degrees of freedom

F-statistic

Test statistic for ANOVA

Post-Hoc Tests

  • If ANOVA indicates significant differences, post-hoc tests (e.g., Tukey's HSD) can identify which means differ.

Applications of ANOVA

  • Comparing means across multiple treatments or groups in experimental and observational studies.

  • Examples: Comparing average test scores across different teaching methods, or mean blood pressure across different diets.

Additional info: Post-hoc tests are only performed if the overall ANOVA is significant. Equal variance can be checked using Levene's test or by comparing sample standard deviations.

Pearson Logo

Study Prep