HTAC: Hierarchical Task-Aware Composition for Continual Offline Reinforcement Learning
Abstract
Continual Offline Reinforcement Learning (CORL) enables building long-term autonomous agents from static datasets. However, it faces heterogeneity in environment dynamics, reward functions, and behavior policies across tasks. Combined with the inherent distribution shift in offline learning, this requires agents to selectively reuse shared knowledge during transfer while isolating task-specific features. The flat knowledge sharing mechanisms employed by existing methods struggle to capture such distinctions, limiting cross-task generalization. To address this, we propose Hierarchical Task-Aware Composition (HTAC), which balances plasticity and stability through dual-level task encoding and soft composition mechanisms. HTAC comprises four modules: (1) a Hierarchical Semantic Task Representation that decomposes tasks into domain-level and task-level embeddings; (2) a Dual-level Expert Network that creates domain and task experts on demand for parameter-efficient knowledge isolation; (3) an Adaptive Knowledge Composition module that integrates historical expert outputs via attention mechanisms for knowledge reuse; (4) Task Adapters that preserve historical routing weights to prevent forgetting. Experiments on Offline Continual World show that HTAC outperforms existing baselines, demonstrating better knowledge reuse and transfer capabilities.