Structured Multi-modal Graph Disentanglement for Psychiatric Diagnosis
Abstract
Multi-modal neuroimaging diagnosis must integrate cross-modal agreement with modality-specific complementarity, yet in real multi-site cohorts these signals are frequently entangled with site- and cohort-dependent correlations, yielding shortcut-driven predictions, fragile transfer, and limited interpretability. We propose Structured Multi-modal Graph Disentanglement (SMGD), which explicitly factorizes multi-modal graph representations into four components with distinct roles: shared diagnostic evidence, complementary diagnostic evidence, incidental cross-modal agreement, and modality-specific non-robust correlations. SMGD is realized as geometry-driven structure learning: under a mild distributional assumption, we develop mini-batch estimable surrogate regularizers that shape subspace organization and cross-modal relations, enforcing semantic consistency through relational geometry rather than centroid coincidence while suppressing confounded dependencies. Experiments on large multi-site datasets (ABIDE-I, SRPBS) show improved in-domain diagnosis and more reliable cross-dataset generalization under modality gap, without expert-crafted features. Code is available at: \url{https://anonymous.4open.science/r/anonymousICML2026/README.md}.