Position: Medical AI Neglects Real Treatment Outcomes
Abstract
Medical AI has rapidly improved its ability to perform diagnostic and prognostic tasks that lead to treatment decisions. But understanding of treatment itself is still inadequately trained and evaluated, using human opinions and syntheses (especially texts such as biomedical publications and clinical practice guidelines) rather than actual underlying data on treatment outcomes. This neglect seriously limits the long-term potential of medical AI, and is already causing deficiencies in both frontier models and major benchmarks, as argued in this position paper. Real treatment outcomes, drawn from sources such as observational databases and randomized experiments, should be substantially incorporated into both training and evaluation. Improving these outcomes should be reemphasized as the goal of all medical AI.