Salus: Strategic Diagnostic Testing for Complex Diagnosis via Multi-Agent Reinforcement Learning
Shuohao Gao ⋅ Xuanzhong Chen ⋅ Lingxiao Luo ⋅ Zilin Ding ⋅ Rong Han ⋅ Rui Jiang ⋅ Ting Chen
Abstract
Diagnosing complex diseases is inherently a sequential and iterative medical investigation process, in which a clinician strategically requests multiple rounds of diagnostic tests to differentiate among similar diseases until reaching a definitive diagnosis. Although large language models show great potential as clinical assistants, they often struggle to navigate this complex interactive process, suffering from premature diagnostic closure. Furthermore, optimizing LLMs for such multi-round environments is frequently hindered by the challenge of reward sparsity and hacking. In this paper, we introduce $\textbf{CompDiag-Bench}$, a benchmark that formalizes diagnosis as a sequential decision-making process where a clinician must strategically request diagnostic tests from a dynamic environment in order to reach a definitive diagnosis. To address this task, we propose $\texttt{Salus}$, a multi-agent framework that decouples diagnostic reasoning into three specialized functional roles: a Differential Reasoner, a Strategic Controller, and a Workup Proposer. $\texttt{Salus}$ is optimized via multi-agent reinforcement learning employing structured rewards to calibrate strategic diagnostic behavior. Specifically, we leverage an LLM-as-a-Judge reward mechanism to provide dense, semantically-grounded feedback, designed to penalize premature closure and incentivize accurate differential diagnoses. Experimental results show that our model, $\texttt{Salus-7B}$, attains state-of-the-art Top-1 accuracy of $83.64\%$ on complex cases, outperforming DeepSeek-V3.2 ($71.38\%$) and achieving performance on par with GPT-5.2 ($80.30\%$).
Successful Page Load