Position: Stop Chasing the C-index when Evaluating Survival Analysis Models
Abstract
The current state of evaluation in survival analysis is plagued by the persistent use of evaluation metrics in ways that are misaligned with the stated modeling objective. In addition, many such evaluations are based on censoring assumptions that are left implicit or unjustified. This means that the reported performance can be misleading and may fail to answer the scientific or modeling question the evaluation was intended to address. In this position paper, we present a critical analysis of evaluation practices in survival analysis and highlight why evaluation in survival analysis fundamentally differs from standard regression or classification due to censoring. We place particular focus on concordance-based measures, such as the C-index, which our findings indicate are heavily overused in the literature. To help identify appropriate metrics, we propose a set of key desiderata and introduce a double-helix ladder, in which valid evaluation requires alignment between metric and modeling assumptions, and we provide empirical evidence that this is effective. We conclude by providing practical guidance on how to evaluate a survival model.