Control family-wise error rate when conducting multiple statistical tests by adjusting the significance level.
Last updated: April 2026
Typically 0.05
How many comparisons?
Comma-separated values between 0 and 1
Corrected α (individual threshold)
0.005000
Reject null if p < 0.005000
| Number of Tests (m) | No Correction | Bonferroni α' | Consequence |
|---|---|---|---|
| 5 | 22.6% | 0.01000 | 1 in 5 tests has false positive |
| 10 | 40.1% | 0.00500 | 2 in 5 tests have false positives |
| 20 | 64.2% | 0.00250 | Stricter threshold, reduces power |
| 100 | 99.4% | 0.00050 | Very conservative, hard to reject |
FWER = 1 - (1 - α)^m for independent tests. Bonferroni controls FWER at exactly α by setting α' = α/m.
Bonferroni correction is a statistical method that adjusts significance levels when conducting multiple hypothesis tests. Without correction, the probability of making at least one Type I error (false positive) increases dramatically as you perform more tests.
The Problem: If you conduct 10 independent tests at α = 0.05, the family-wise error rate (probability of at least one false positive) exceeds 40%. The Solution: Bonferroni divides the significance level by the number of tests: α' = α / m.
Scenario: Testing 10 genes for differential expression at α = 0.05
Setup:
Results for 5 genes:
Without correction, we'd declare 3/5 tests significant. With Bonferroni, only 2 pass the stricter threshold, reducing false discoveries.
Type I vs. Family-Wise Error?
Type I error: false positive in one test. Family-Wise error: probability of ANY false positive among all tests. Bonferroni controls the latter (stricter).
Is Bonferroni too conservative?
Yes, often. With many tests, the threshold becomes very stringent (e.g., 0.05/1000 = 0.00005), reducing statistical power. Consider Holm or FDR instead.
How to count the number of tests?
Count every statistical test: t-tests, correlations, ANOVAs, all comparisons. Conservative counting ensures protection; undercounting increases false positives.
Use Bonferroni always?
Not always. Use for confirmatory studies with specific hypotheses. For exploratory analysis on many variables, FDR (False Discovery Rate) is often better.
What if tests aren't independent?
Bonferroni assumes independence. For correlated tests, it's overly conservative. Holm-Bonferroni or permutation methods may be better.
Can adjusted p-values exceed 1?
Yes, they're capped at 1.0. When adjusted p = p × m > 1, the original p-value exceeds the corrected threshold by a large margin.
Bonferroni vs. FDR?
Bonferroni: controls family-wise error (stricter, fewer false positives). FDR: controls proportion of false discoveries (less strict, more power).
What's Holm-Bonferroni?
An improvement: order p-values, then compare to α/m, α/(m-1), α/(m-2)... Less conservative while still controlling family-wise error.
Related Tools