Transport or Discard: Robust Unbalanced Optimal Transport for Cross-Domain Policy Adaptation
Abstract
Cross-domain offline reinforcement learning leverages a source dataset to improve policy learning in a data-scarce target domain, but dynamics mismatch makes many source transitions kinematically infeasible and can cause negative transfer. Recent non-parametric geometric methods (e.g., standard optimal transport and k-nearest neighbors) avoid overfitting yet often yield only relative rankings under an implicit matching or retrieval budget, making performance sensitive to hand-tuned thresholds when the true cross-domain overlap is unknown. We formulate availability estimation as soft subset selection by learning a source reweighting that geometrically aligns with the target. We propose Robust Offline unbalanced Optimal Transport (ROOT): (i) a robust ambiguity set for uncertainty under limited target samples, and (ii) an unbalanced transport objective that penalizes mass deviation, enabling a principled transport-or-discard mechanism. ROOT thus down-weights or discards high-cost source samples rather than forcing them onto the target support. Moreover, the induced weights decay exponentially with transport cost, guaranteeing outlier suppression. On D4RL dynamics-shift benchmarks, ROOT improves downstream offline RL and outperforms strong baselines on most tasks without task-specific threshold tuning.