Confusion Matrix Calculator

Confusion Matrix Calculator

Evaluate classification model performance with comprehensive metrics.

Last updated: March 2026

Confusion Matrix Input

Predicted +
Predicted −
Actual +
TP (Correct)
FN (Miss)
Actual −
FP (False Alarm)
TN (Correct)

Key Performance Metrics

Accuracy
93.81%
Precision (PPV)
85.00%
Recall (Sensitivity)
62.96%
Specificity (TNR)
98.36%
F1 Score
0.7234
MCC
0.6993

Additional Metrics

Total Samples
1,050
Balanced Accuracy
80.66%
NPV
94.74%
Prevalence
12.86%
FPR (Type I)
1.64%
FNR (Type II)
37.04%

What is a Confusion Matrix?

A confusion matrix is a table showing how well a classification model performs. It compares predicted labels against actual labels across all samples. The matrix reveals not just whether predictions are right or wrong, but what types of errors the model makes.

The four cells represent: **True Positives** (correctly predicted positive), **True Negatives** (correctly predicted negative), **False Positives** (incorrectly predicted positive), and **False Negatives** (incorrectly predicted negative). From these four values, we derive all classification metrics.

Confusion matrices are essential for understanding model behavior beyond simple accuracy, especially in imbalanced datasets or when false positives and false negatives have different real-world consequences (e.g., disease diagnosis).

Key Metrics Explained

Primary Metrics

Accuracy
(TP + TN) / Total — Overall correctness. High accuracy doesn't always mean good: a model predicting "no disease" for everyone has high accuracy on rare diseases.
Precision
TP / (TP + FP) — Of predicted positives, how many were correct? High precision = fewer false alarms. Important when false positives are costly (unnecessary treatment).
Recall
TP / (TP + FN) — Of actual positives, how many were found? High recall = catching cases. Important when false negatives are costly (missing disease).
F1 Score
2 × (Precision × Recall) / (Precision + Recall) — Harmonic mean balancing precision and recall. Use when you need both.

Advanced Metrics

Specificity
TN / (TN + FP) — Of actual negatives, how many were correctly identified? High specificity = few false positives.
MCC
Matthews Correlation Coefficient — Ranges -1 to +1. Balanced metric for imbalanced datasets. +1 = perfect, 0 = random, -1 = inverse.
NPV
Negative Predictive Value = TN / (TN + FN) — Of predicted negatives, how many were correct?

Example: Disease Screening Model

A model screens 1050 patients for a disease:

TP = 85 (correctly diagnosed with disease)
TN = 900 (correctly identified as healthy)
FP = 15 (false alarms — healthy but flagged)
FN = 50 (missed cases — has disease but not detected)
Accuracy:
(85 + 900) / 1050 = 85% — Looks good, but tells incomplete story
Recall:
85 / (85 + 50) = 63% — Missing 37% of diseased patients (dangerous!)
Precision:
85 / (85 + 15) = 85% — Of flagged patients, 85% actually have disease
Insight:
High accuracy but low recall means the model is missing actual cases. In medical diagnosis, this is unacceptable — recall matters more than accuracy here.

Frequently Asked Questions

Precision vs. Recall — which matters more?

Depends on context. High precision for spam filters (avoid false positives). High recall for disease detection (avoid false negatives/missing cases). Often you optimize one while maintaining threshold on the other.

Why is accuracy misleading?

In imbalanced data, a model predicting majority class always has high accuracy. Example: disease present in 1% of data, model says 'no disease' always = 99% accuracy but useless. Use precision, recall, or MCC instead.

What's a good F1 score?

F1 ranges 0–1. Above 0.7 is good, 0.8+ is excellent, below 0.5 is poor. F1 is useful when you care about both precision and recall equally, but weights vary by domain.

When should I use MCC instead of accuracy?

MCC is better than accuracy for imbalanced datasets: it accounts for all four cells and is symmetric across classes. Score near 0 = random, near +1 = excellent, near -1 = inverse prediction.

What does 'balanced accuracy' mean?

Balanced Accuracy = (Recall + Specificity) / 2. It averages performance on both classes, preventing bias toward the majority class. Good for imbalanced datasets.

How do I read off error rates?

FPR (False Positive Rate) = FP / (FP + TN) = false alarms among actual negatives. FNR (False Negative Rate) = FN / (FN + TP) = misses among actual positives. Sum ≠ 100%.

Related Tools