Spotlight Poster
The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-fidelity Data
Thomas Pouplin · Katarzyna Kobalczyk · Hao Sun · Mihaela van der Schaar
East Exhibition Hall A-B #E-2402
Training AI agents to follow natural language instructions and complete complex tasks remains a major challenge, especially when we can’t rely on lots of labeled data or real-time trial-and-error exploration strategies. Many current reinforcement learning (RL) methods struggle to handle new goals or previously unseen states, making them hard to apply in the real world.In this paper, we present TEDUO, a new training pipeline that teaches agents to follow language instructions using only pre-recorded, unlabelled datasets. TEDUO employs large language models (LLMs) in two ways: first, to help label and enrich offline data, and second, to act as agents that can interpret and carry out instructions. This combination helps the system learn more efficiently and generalise better to new tasks.Our results show that TEDUO can solve tasks that standalone offline RL methods or out-of-the-box LLMs alone cannot handle, offering a more practical path toward building capable, instruction-following agents.
Live content is unavailable. Log in and register to view live content