Difference-Aware Decision Learning for Multimodal Image Fusion
Abstract
Multimodal image fusion integrates complementary information from different modalities. However, large cross-modal discrepancies and local conflicts often introduce uncertainty into fusion decisions. This uncertainty can bias modality allocation in inconsistent regions, leading to information loss or the propagation of artifacts. Therefore, we address this problem by formally casting image fusion as an integrated probabilistic decision system that couples prior decision-making with posterior risk minimization. Based on this view, we propose a dIfference-aware Decision-lEArning muLtimodal image fusion paradigm (IDEAL). It treats cross-modal differences as decision triggers and learns contribution policies conditioned on local conditions. Specifically, we use a difference-attention module to generate multi-scale difference maps as spatial decision conditions. We also obtain spectral conditions by projecting features into the frequency domain, where power-spectrum energy, complementary spectra, and spectral-entropy reliability characterize modality discrepancy and reliability. We then employ a symmetric Beta prior to map these decision conditions to gating weights, yielding explicit and interpretable modality contribution policies. To improve robustness, we introduce an uncertainty modulation mechanism that reverts the policy to conservative mixing when conditions is insufficient. Extensive experiments demonstrate stable and competitive performance.