SOLAR for Offline MARL: Plateau-Triggered Potential Shaping under World-Model Uncertainty
Jusheng Zhang ⋅ Yijia Fan ⋅ Ruiqi Chen ⋅ Jing Yang ⋅ Ziliang Chen ⋅ Yongsen Zheng ⋅ Yanxi Chen ⋅ Jian Wang ⋅ Kwok Yan Lam ⋅ Liang Lin ⋅ Keze Wang
Abstract
Reward shaping can accelerate reinforcement learning, but in sparse-reward \emph{offline} multi-agent RL it is often brittle: dense intrinsic rewards may alter the underlying Markov game, while world-model guidance can amplify model bias. We find that shaping becomes reliable when it is (i) activated only after \emph{statistically validated} learning plateaus and (ii) constrained to \emph{potential-based} shaping, which preserves the task optimum. Motivated by this, we propose \textsc{SOLAR}, a simulate--evaluate--shape framework. A learned world model enables low-cost rollouts to test plateaus; once a plateau is detected, we inject shaping in the form $r+\gamma\Phi(s')-\Phi(s)$ with adaptively updated potentials; and we attenuate shaping using uncertainty-aware throttling in unreliable regions. We provide theoretical analysis on policy invariance and on the deviation of plateau decisions under model error, and establish stability for the resulting two-timescale adaptation. Experiments on sparse-reward offline MARL benchmarks show consistent gains in stability and final performance across dataset qualities.
Successful Page Load