Poster

FedSSM: State Space Model-based Proactive Inference for Heterogeneous Multimodal Federated Learning

Hengyi Ren ⋅ Yuchen Xie ⋅ Changlong Wang ⋅ Xin Li ⋅ Yue Huang ⋅ Jian Guo ⋅ Lijuan Sun

Abstract

Multimodal Federated Learning (MMFL) addresses collaborative training across clients with heterogeneous modality configurations, where effective client selection becomes critical under the compounded challenges of modality, distribution, and quantity heterogeneity. Existing selection methods operate within a reactive paradigm, responding to current observations without anticipating how decisions influence future optimization trajectories. This myopic approach leads to suboptimal convergence when training dynamics shift rapidly under severe heterogeneity. We propose FedSSM, which reconceptualizes client selection as a proactive decision-making process by predicting training dynamics through decision-aware state space models. The prediction error yields a \emph{surprise} signal that quantifies uncertainty and governs adaptive participation budgets and exploration-exploitation trade-offs via counterfactual reasoning over candidate actions. For aggregation, we introduce trust-weighted fusion with modality-specific routing, where surprise calibrates sensitivity to client anomalies. Experiments on four multimodal benchmarks demonstrate that FedSSM achieves 2.5--4.5\% accuracy improvements over state-of-the-art methods while reducing communication rounds by over 30\%.