Power Analysis Calculator

Power Analysis Calculator

Estimate the required sample size for a two-group comparison of means using Cohen's d, a chosen significance level, and desired statistical power.

Last updated: March 2026

Study Parameters

Type I error rate, commonly 0.05

Probability of detecting the target effect if it truly exists

Small ≈ 0.2, Medium ≈ 0.5, Large ≈ 0.8

Two-tailed tests are usually used unless direction is justified in advance

Required Sample Size (per group)
63
participants needed in each group
Significance (α)0.05
Power (1−β)0.8
Effect Size (d)0.5
Total N (both groups)126
Recommendation: You need at least 63 participants per group (total N = 126) to detect an effect size of 0.5 with 80% power at α = 0.05.

What is Power Analysis?

Power analysis is used to estimate the sample size needed to detect an effect of a chosen size with a chosen probability. Statistical power (1−β) is the probability of detecting the target effect if that effect truly exists.

In this calculator, the study design is a two-group comparison of means with equal group sizes, and the effect size is expressed as Cohen's d. The key ingredients are the significance level (α), desired power (1−β), effect size (d), and the required sample size per group.

Higher desired power, smaller effect sizes, and more stringent significance thresholds all push the required sample size upward.

How to Conduct Power Analysis

The Formula

n per group = 2 × [(Zα + Zβ) / d]²
For two-tailed tests, use Zα/2 in place of Zα
n = required sample size per group
Z = standard normal critical values
d = Cohen's d standardized effect size

Step-by-Step Process

Step 1: Choose α, often 0.05
Step 2: Choose desired power, often 0.80 or 0.90
Step 3: Choose an effect size d from prior evidence or practical importance
Step 4: Choose one-tailed or two-tailed testing
Step 5: Compute required n per group
Step 6: Inflate for expected attrition if needed

Cohen's d Effect Size Guidelines

Small (d ≈ 0.2): subtle effect, usually needs a larger sample
Medium (d ≈ 0.5): moderate effect
Large (d ≈ 0.8): larger effect, usually detectable with fewer participants
d = (μ₁ − μ₂) / σ

Example: Designing a Clinical Trial

Planning a study to compare two treatments:

Given:
α = 0.05
Power = 0.80
Effect size d = 0.5
Test = Two-tailed
Step 1:
Find critical values:
Zα/2 = 1.96
Zβ = 0.84
Step 2:
Calculate sample size per group:
n = 2 × [(1.96 + 0.84) / 0.5]²
n = 2 × [2.80 / 0.5]²
n = 2 × [5.60]²
n = 2 × 31.36
n = 62.72 → 63 per group
Conclusion:
Required: 63 participants per group
• Total N = 126 participants
• With 15% attrition, recruit about 75 per group (150 total)
• This targets 80% power to detect d = 0.5 at α = 0.05

Frequently Asked Questions

Why is 80% power common?

It is a common compromise between detecting meaningful effects and keeping studies feasible. Some settings use higher targets, such as 90%, when missing an effect would be especially costly.

What if I don't know the effect size?

Use prior studies, pilot data, or the smallest effect that would be practically important. If uncertain, trying a range of plausible values is often more informative than relying on a single guess.

Should I use one-tailed or two-tailed tests?

Use two-tailed tests unless a one-direction hypothesis is fully justified before data collection and an opposite-direction effect would not count as evidence of interest.

What's the difference between α and β?

α is the false-positive risk threshold. β is the false-negative probability for the target effect size. Power equals 1 − β.

Can I do power analysis after data collection?

Prospective power analysis is usually more useful. After data collection, confidence intervals, observed effect sizes, and precision are often more informative than post-hoc power.

How do I account for dropouts?

Divide the required final sample size by the expected retention proportion. For example, if you need 126 total and expect 85% retention, recruit 126 / 0.85 ≈ 149, usually rounded up to 150.

Why does a smaller effect size need a larger sample?

Smaller effects are harder to distinguish from random variation, so more data are needed to detect them reliably.

What if my sample is much larger or smaller than planned?

Too small can leave the study underpowered. Much larger samples improve precision, but can also make very small effects statistically detectable even when they are not practically important.

Related Tools