A Machine-Learned Comorbidity Index
Abstract
Traditional comorbidity scores (e.g., Charlson and Elixhauser) are widely used for risk adjustment and patient stratification, but they have two key limitations: they are largely mortality-centric and do not align well with other outcomes, and their linear, rule-based structure cannot capture nonlinear, outcome-specific risk relationships. We propose a Machine-Learned Comorbidity Index (MLCI) that maps diagnosis codes to a single scalar by maximizing the normalized Hilbert–Schmidt Independence Criterion (nHSIC) between the score and multiple clinical outcomes. MLCI captures nonlinear risk–outcome dependence and is supported by a novel theory that characterizes when a unified, informative patient ordering can be achieved across outcomes. Empirical results on multiple benchmark electronic health record (EHR) datasets show that MLCI outperforms strong single-index baselines across multiple evaluation metrics.