Evaluate classification model performance with comprehensive metrics.
Last updated: March 2026
A confusion matrix is a table showing how well a classification model performs. It compares predicted labels against actual labels across all samples. The matrix reveals not just whether predictions are right or wrong, but what types of errors the model makes.
The four cells represent: **True Positives** (correctly predicted positive), **True Negatives** (correctly predicted negative), **False Positives** (incorrectly predicted positive), and **False Negatives** (incorrectly predicted negative). From these four values, we derive all classification metrics.
Confusion matrices are essential for understanding model behavior beyond simple accuracy, especially in imbalanced datasets or when false positives and false negatives have different real-world consequences (e.g., disease diagnosis).
A model screens 1050 patients for a disease:
Depends on context. High precision for spam filters (avoid false positives). High recall for disease detection (avoid false negatives/missing cases). Often you optimize one while maintaining threshold on the other.
In imbalanced data, a model predicting majority class always has high accuracy. Example: disease present in 1% of data, model says 'no disease' always = 99% accuracy but useless. Use precision, recall, or MCC instead.
F1 ranges 0–1. Above 0.7 is good, 0.8+ is excellent, below 0.5 is poor. F1 is useful when you care about both precision and recall equally, but weights vary by domain.
MCC is better than accuracy for imbalanced datasets: it accounts for all four cells and is symmetric across classes. Score near 0 = random, near +1 = excellent, near -1 = inverse prediction.
Balanced Accuracy = (Recall + Specificity) / 2. It averages performance on both classes, preventing bias toward the majority class. Good for imbalanced datasets.
FPR (False Positive Rate) = FP / (FP + TN) = false alarms among actual negatives. FNR (False Negative Rate) = FN / (FN + TP) = misses among actual positives. Sum ≠ 100%.
Related Tools
Evaluate classification models.
Diagnostic test accuracy.
Diagnostic test performance.
Updated diagnostic probability.
Risk ratio comparison.
Event probability.