SPR: A Structured Prompt Refinement Network for Modality Missing
Abstract
Prompt learning has recently emerged as a novel, parameter-efficient paradigm to tackle the missing modalities challenge. However, existing prompting methods often overlook the internal structural information within prompt vectors, limiting their effectiveness in guiding frozen backbone models under diverse missing modality scenarios. To address this limitation, we propose a Structured Prompt Refining (SPR) network that refines the internal structure of prompt vectors across multiple dimensions: (1) a Global Interaction Fusion Module captures bidirectional interactions across prompt layers, thereby mitigating sub-optimal adaptation from inconsistent guidance under missing modalities, (2) a Local Feature Refinement Module structures adjacent prompt vectors into coherent semantic units, leveraging local contextual relationships to maintain semantic integrity during modality absence, and (3) a Channel Feature Selection Module uses point-wise gating to adaptively suppress noise and enhance critical channels based on the specific missing modality. Using only 0.8% trainable parameters, SPR achieves significant improvements on three mainstream multimodal classification datasets. Notably, it surpasses state-of-the-art by 3.8% in F1-Macro on the MM-IMDB dataset, even at a 90% modality missing rate. Extensive experiments and in-depth ablations validate SPR's effectiveness and robustness under various missing conditions.