Poster Tue, Jul 7, 2026 • 10:30 PM – 12:15 AM PDT HALL A #502

Structured Expert Routing with Multi-View Task Priors for Offline Meta-Reinforcement Learning

Yisen Zhao ⋅ Peixi Peng ⋅ Xinyu Hu ⋅ Cong Li ⋅ Zhan Su ⋅ Zhuojian Li

Abstract

Offline meta-reinforcement learning requires agents to generalize to unseen tasks from fixed datasets, yet existing sequence-based and MoE-based methods rely on implicit or token-level routing signals that fail to capture task-level structure. We propose the Task-Guided Router (TGR), a structured expert-routing framework that explicitly models inter-task relationships via multi-view task representations that combine semantic descriptors, behavioral summaries, and latent dynamics features. Using structure-guided routing, TGR assigns experts based on global task compatibility rather than local trajectory fragments, enabling stable specialization and effective knowledge transfer across tasks.Extensive experiments on continuous-control benchmarks demonstrate that TGR consistently outperforms state-of-the-art offline meta-RL methods in few-shot generalization, particularly under sparse data and heterogeneous dynamics. Our results highlight the importance of task-level priors for robust offline meta-reinforcement learning.

Lay Summary

Artificial intelligence systems often need to apply past experience to new situations. A major challenge is determining which previous experiences are most relevant when facing an unfamiliar task. We propose a new method that helps AI make this choice by building an internal understanding of how different environments behave and evolve over time. The system combines information about task descriptions, observed behavior, and learned environment dynamics to form a richer picture of each task. It then uses this information to select the most suitable specialized knowledge for solving the problem at hand. Unlike existing approaches that mainly rely on short-term observations, our method explicitly captures underlying patterns of environmental change. This allows the system to better recognize similarities between tasks, even when they appear different on the surface. Across a range of decision-making benchmarks, our approach consistently improves the ability to adapt to previously unseen tasks using only limited data. The work contributes toward more adaptable AI systems that can transfer knowledge more effectively and operate reliably in new environments.