ICML Poster The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-fidelity Data

Spotlight Poster

The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-fidelity Data

Thomas Pouplin · Katarzyna Kobalczyk · Hao Sun · Mihaela van der Schaar

East Exhibition Hall A-B #E-2402

[ Abstract ] [ Lay Summary ]

[ OpenReview]

Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract: Developing autonomous agents capable of performing complex, multi-step decision-making tasks specified in natural language remains a significant challenge, particularly in realistic settings where labeled data is scarce and real-time experimentation is impractical. Existing reinforcement learning (RL) approaches often struggle to generalize to unseen goals and states, limiting their applicability. In this paper, we introduce $\textit{TEDUO}$, a novel training pipeline for offline language-conditioned policy learning in symbolic environments. Unlike conventional methods, $\textit{TEDUO}$ operates on readily available, unlabeled datasets and addresses the challenge of generalization to previously unseen goals and states. Our approach harnesses large language models (LLMs) in a dual capacity: first, as automatization tools augmenting offline datasets with richer annotations, and second, as generalizable instruction-following agents. Empirical results demonstrate that $\textit{TEDUO}$ achieves data-efficient learning of robust language-conditioned policies, accomplishing tasks beyond the reach of conventional RL frameworks or out-of-the-box LLMs alone.

Lay Summary:

Training AI agents to follow natural language instructions and complete complex tasks remains a major challenge, especially when we can’t rely on lots of labeled data or real-time trial-and-error exploration strategies. Many current reinforcement learning (RL) methods struggle to handle new goals or previously unseen states, making them hard to apply in the real world.In this paper, we present TEDUO, a new training pipeline that teaches agents to follow language instructions using only pre-recorded, unlabelled datasets. TEDUO employs large language models (LLMs) in two ways: first, to help label and enrich offline data, and second, to act as agents that can interpret and carry out instructions. This combination helps the system learn more efficiently and generalise better to new tasks.Our results show that TEDUO can solve tasks that standalone offline RL methods or out-of-the-box LLMs alone cannot handle, offering a more practical path toward building capable, instruction-following agents.

Chat is not available.