Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases
Abstract
Clinical abnormality grounding for rare diseases is often hindered by data scarcity, rendering supervised fine-tuning infeasible and single-pass inference highly unstable. Thus, we propose Dynamic Decision Learning (DDL), a framework that enables frozen LVLMs to refine their decisions across language and visual spaces by optimizing instructions and consolidating predictions under visual perturbations, thereby improving localization quality and producing a consensus‑based reliability score that quantifies the model’s confidence. Results on brain‑imaging benchmarks, including a rare‑disease dataset with 281 pathology types across 3B-72B models, show that DDL improves mAP@75 by up to 105\% on rare‑disease cases and surpasses adaptation baselines and supervised fine‑tuning. Moreover, we show that DDL yields stronger calibration between consensus‑based reliability scores and localization accuracy under severe distribution shifts and increasing task difficulty. The code will be open-sourced.