Power, Sample Size, and Effect Size in Statistical Analysis

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Lecture 11: Power & Sample Size

Introduction to Power and Sample Size

Statistical analysis often involves determining whether observed effects are meaningful and whether sample sizes are adequate. This lecture explores the concepts of statistical power, sample size, and effect size, and discusses the potential pitfalls of using samples that are too large or too small.

Statistical Power: The probability that a test will correctly reject a false null hypothesis (i.e., detect a true effect).
Sample Size: The number of observations or subjects included in a study. Larger samples generally increase power but may also lead to detection of trivial effects.
Effect Size: A quantitative measure of the magnitude of the experimental effect.

Example: The introductory graph shows two distributions (null and alternative hypotheses) and highlights the effect size as the separation between their means.

Association Between Categorical Variables

Chi-Square Test and Cramer's V

When analyzing categorical data, the chi-square test is commonly used to assess whether there is an association between two variables. Cramer's V is a measure of the strength of association for categorical variables.

Chi-Square Test: Tests the null hypothesis that there is no association between two categorical variables.
Cramer's V: Ranges from 0 (no association) to 1 (perfect association). It is not affected by sample size.

Formula for Cramer's V:

Interpretation: Values of V < 0.1 indicate a small effect, V ≈ 0.3 a medium effect, and V > 0.5 a large effect (see Cohen's criteria below).

Example: In a study of lionesses' hunting roles, the association between position and role was tested. The null hypothesis of no association was rejected, but the effect size (V = 0.245) was small.

Statistical Significance vs. Practical Relevance

Understanding the Difference

Statistical significance indicates that an observed effect is unlikely to be due to chance, but does not guarantee that the effect is meaningful in practice. Large samples can make even trivial effects statistically significant.

Statistical Significance: Determined by the p-value; if p < α (commonly 0.05), the effect is considered statistically significant.
Practical Relevance: Refers to whether the effect size is large enough to be meaningful in the real world.

Example: A cartoon illustrates that a statistically significant link between carrots and intelligence may have little practical significance (e.g., a 0.00001 improvement).

Effect Size: Definitions and Criteria

Cohen's Criteria for Effect Size

Effect size quantifies the magnitude of a result. Cohen (1988) proposed generic criteria for small, medium, and large effects for various statistical tests.

Test	Small	Medium	Large
t-Test (Cohen's d)	< 0.2	0.5	0.8
Chi-square (Cramer's V)	< 0.1	0.3	0.5
Odds Ratio (OR)	< 1.5	2.3	3.5

Additional info: These thresholds are guidelines; context and field-specific standards may vary.

Sample Size: Can Samples Be Too Big?

Problems with Large Samples

While increasing sample size generally increases statistical power, excessively large samples can lead to detection of effects that are statistically significant but not practically relevant.

10% Condition: For some tests (e.g., z-test for proportions), the sample should not exceed 10% of the population.
Census Data: Statistical inference is unnecessary for census data, as the entire population is measured.
High Power: Large samples can make very small effects statistically significant, even if they are not meaningful.

Example: In epidemiological studies, large samples may reveal statistically significant but trivial associations.

Power of a Statistical Test

Factors Affecting Power

Power is the probability of correctly rejecting a false null hypothesis. It depends on several factors:

Sample Size (N): Larger samples increase power.
Effect Size: Larger effects are easier to detect.
Significance Level (α): Lower α reduces power.
Statistical Test Used: Some tests are more powerful than others for certain data types.

Formula for Power (general):

Example: In a toast experiment with n = 70, the power to detect a true proportion of 0.62 was assessed. The closest value to the power was 0.84.

Judging the Strength of Evidence

Three Aspects to Report

When reporting results, always check and report:

Direction of Effect: Is the effect in the expected direction?
Biological/Practical Relevance: Is the effect size large enough to matter?
Statistical Significance: Is the effect unlikely to be due to chance?

Additional info: For categorical data, report which cells are over- or under-represented, and use measures like Cramer's V for strength of association.

Summary Table: Effect Size, Sample Size, and Power

Aspect	Influenced by Sample Size?	Interpretation
Test Statistic (e.g., chi-square)	Yes	Increases with sample size
Degrees of Freedom	No	Depends on number of categories
p-value	Yes	Smaller with larger samples (for same effect)
Standardized Residuals	No	Indicate direction of effect
Cramer's V	No	Measures strength of association

Best Practices in Reporting Statistical Results

Recommendations

Always report effect size, sample size, and statistical significance.
Interpret results in terms of both statistical and practical relevance.
Be cautious when interpreting results from very large samples.
Use appropriate measures for categorical and quantitative data.

Additional info: For rare categories or uneven marginal distributions, Cramer's V may underestimate association strength; consider alternative measures.