ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation
Abstract
Electrocardiography (ECG) serves as an indispensable diagnostic tool in clinical practice, yet existing multimodal large language models (MLLMs) remain unreliable for ECG interpretation, often producing plausible but clinically incorrect analyses. To address this, we propose ECG-R1, the first reasoning MLLM designed for reliable ECG interpretation via three innovations. First, we construct the interpretation corpus using \textit{Protocol-Guided Instruction Data Generation}, grounding interpretation in measurable ECG features and monograph-defined quantitative thresholds and diagnostic logic. Second, we present a modality-decoupled architecture with \textit{Interleaved Modality Dropout} to improve robustness and cross-modal consistency when either the ECG signal or ECG image is missing. Third, we propose \textit{Reinforcement Learning with ECG Diagnostic Evidence Rewards} to explicitly supervise diagnostic evidence and strengthen reasoning quality. Additionally, we systematically evaluate the ECG interpretation capabilities of proprietary, open-source, and medical MLLMs, and provide the first quantitative evidence that severe hallucinations are pervasive in these MLLMs, suggesting that their outputs should not be relied upon by the public. Code will be released upon acceptance.