Estimate the required sample size for a two-group comparison of means using Cohen's d, a chosen significance level, and desired statistical power.
Last updated: March 2026
Type I error rate, commonly 0.05
Probability of detecting the target effect if it truly exists
Small ≈ 0.2, Medium ≈ 0.5, Large ≈ 0.8
Two-tailed tests are usually used unless direction is justified in advance
Power analysis is used to estimate the sample size needed to detect an effect of a chosen size with a chosen probability. Statistical power (1−β) is the probability of detecting the target effect if that effect truly exists.
In this calculator, the study design is a two-group comparison of means with equal group sizes, and the effect size is expressed as Cohen's d. The key ingredients are the significance level (α), desired power (1−β), effect size (d), and the required sample size per group.
Higher desired power, smaller effect sizes, and more stringent significance thresholds all push the required sample size upward.
Planning a study to compare two treatments:
It is a common compromise between detecting meaningful effects and keeping studies feasible. Some settings use higher targets, such as 90%, when missing an effect would be especially costly.
Use prior studies, pilot data, or the smallest effect that would be practically important. If uncertain, trying a range of plausible values is often more informative than relying on a single guess.
Use two-tailed tests unless a one-direction hypothesis is fully justified before data collection and an opposite-direction effect would not count as evidence of interest.
α is the false-positive risk threshold. β is the false-negative probability for the target effect size. Power equals 1 − β.
Prospective power analysis is usually more useful. After data collection, confidence intervals, observed effect sizes, and precision are often more informative than post-hoc power.
Divide the required final sample size by the expected retention proportion. For example, if you need 126 total and expect 85% retention, recruit 126 / 0.85 ≈ 149, usually rounded up to 150.
Smaller effects are harder to distinguish from random variation, so more data are needed to detect them reliably.
Too small can leave the study underpowered. Much larger samples improve precision, but can also make very small effects statistically detectable even when they are not practically important.
Related Tools