Calculate sensitivity, specificity, PPV, NPV, likelihood ratios, accuracy, F1 score, and prevalence from a 2×2 confusion matrix. Built for diagnostic tests, screening programs, medical research, and any binary classification task.
Last updated: March 2026
TP
FP
FN
TN
Enter valid non-negative whole-number counts for TP, FP, FN, and TN.
Sensitivity and specificity are the core measures used to evaluate diagnostic tests and binary classification systems. Sensitivity measures how well a test detects people who have a condition, while specificity measures how well it correctly excludes people who do not have the condition.
These metrics come from a 2×2 confusion matrix consisting of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN). From these four values, all major performance metrics can be calculated.
Sensitivity = TP / (TP + FN): of all people with the condition, how many tested positive? Specificity = TN / (FP + TN): of all people without the condition, how many tested negative?
TP: Has condition and tests positive
FP: No condition but tests positive
FN: Has condition but tests negative
TN: No condition and tests negative
Example: Diagnostic Test on 200 People
High sensitivity minimizes missed cases and is especially useful for screening. High specificity minimizes false positives and is especially useful for confirmation. Most real-world tests involve a tradeoff between the two.
Predictive values change depending on how common the condition is. Even highly accurate tests can have low PPV in rare diseases, because false positives may outnumber true positives.
Accuracy can look impressive in imbalanced populations while still hiding poor case detection. Always interpret accuracy alongside sensitivity and specificity rather than using it alone.
LR+ shows how much a positive result increases the odds of disease, while LR− shows how much a negative result decreases it. These are especially useful when moving from test performance to clinical decision-making.
Sensitivity measures how well a test detects disease among people who truly have it. PPV measures how likely a positive result is to be correct among all positive results.
Yes. Some tests perform strongly on both measures, although changing the test threshold often creates a tradeoff between catching more true cases and avoiding more false positives.
Because predictive values depend on how many true cases exist in the tested population. As prevalence changes, the balance between true and false results changes too.
Accuracy can be misleading when one class is much more common than the other. A test may look accurate overall while still missing many real cases.
Likelihood ratios show how much a positive or negative result changes the odds of disease. They are especially useful for translating test performance into clinical interpretation.
High sensitivity matters most when missing a case is costly, such as in early screening. High specificity matters more when false positives would lead to unnecessary follow-up or treatment.
Related Tools
Evaluate classification models.
Classification metrics.
Diagnostic test performance.
Updated diagnostic probability.
Risk ratio comparison.
Event probability.