Adaptive Policy Backbone via Shared Network
Abstract
Reinforcement learning (RL) has achieved impressive results across various domains, yet the resulting policies often fail to generalize beyond the specific tasks encountered during training. This lack of robustness limits their deployment in real-world scenarios where diverse and unpredictable task demands exist. We propose the Adaptive Policy Backbone (APB), a transferable policy backbone that contains a meta-initialization to provide a highly generalizable representation. APB consists of a frozen, meta-trained backbone paired with lightweight task-specific linear layers that are learned from scratch for each new environment. Our results demonstrate that learning only lightweight task-specific linear layers is sufficient to achieve performance on par with standard RL, even, surprisingly, when the backbone is randomly initialized. Furthermore, we find that this structural constraint inherently enhances the generalization capability of the resulting policies. This advantage extends even to out-of-distribution tasks, where existing meta-RL methods typically fail.