HyMTRL: A Hybrid Multi-Task Reinforcement Learning Framework via Phased Policy Evolution
Abstract
Multi-task reinforcement learning (MTRL) aims to improve sample efficiency by sharing knowledge across related tasks, but it often suffers from asynchronous learning progress caused by inherent differences in task difficulty. This imbalance places substantial representational strain on the shared critic network, which emerges as a primary performance bottleneck. To address this issue, we propose Hybrid Multi-Task Reinforcement Learning (HyMTRL), a framework that alleviates critic overload through a phased policy evolution strategy. HyMTRL divides task learning into a reinforcement exploration phase and an imitation refinement phase. By transitioning mastered tasks from reinforcement learning–based policy optimization to imitation learning–based behavior consolidation, these tasks are removed from the critic’s optimization objective, effectively reducing representational strain. In addition, a critic reset mechanism restores network capacity while preserving learned policies and historical experience. HyMTRL is a general framework that can be easily integrated with a wide range of existing MTRL methods. Empirical evaluations on the MetaWorld benchmark demonstrate that combining HyMTRL with representative baselines leads to significant improvements in both learning efficiency and final performance.