Robust Multi-View Fusion via Prototype-Anchored Unbalanced Optimal Transport
Abstract
Multi-view classifiers typically fuse all observed views into a single representation, which becomes fragile when some views are missing or corrupted.We propose a prototype-anchored fusion module based on an entropically regularized unbalanced optimal transport (UOT) barycenter.Each view is summarized into a small set of learned atoms and is matched to a shared prototype support; fusion outputs a probability measure over prototypes with fixed dimension.By relaxing marginal constraints with a generalized KL penalty, the UOT objective can leave a fraction of view mass unmatched when matching is geometrically costly, yielding a simple differentiable trimming mechanism without hand-tuned thresholds.We provide a basic theoretical result showing that discarding an arbitrary subset of atom mass incurs a penalty bounded by its total mass, independent of transport distances.Experiments on multi-view action recognition benchmarks under simulated missing views, missing-rate shift, and feature-space corruption demonstrate consistently improved stability under severe missingness with modest overhead on top of strong backbones.