Measure reliability and agreement between multiple raters evaluating the same subjects.
The ICC measures the proportion of variance in measurements attributable to differences between subjects, rather than differences between raters. It quantifies inter-rater reliability—how consistently multiple raters assess the same subjects on a continuous scale. ICC values range from 0 (perfect disagreement) to 1 (perfect agreement).
Three radiologists independently assess 30 CT scans on a severity scale (0–100). An ANOVA reveals BMS = 12.5 (true subject differences) and WMS = 2.1 (rater variability).
ICC = 0.623 (moderate) suggests raters agree reasonably well but some variability remains. Radiologists might benefit from standardized scoring guidelines before averaging their assessments for clinical decisions.
Related Tools