Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
Abstract
While showing sophisticated reasoning abilities, large language models (LLMs) still struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment, especially in sparse-reward scenarios. Inspired by the divide-and-conquer principle, we propose an innovative framework GLIDER (Grounding Language Models as EffIcient Decision-Making Agents via Offline HiErarchical Reinforcement Learning) that introduces a parameter-efficient and generally applicable hierarchy to LLM policies. We develop a scheme where the low-level controller is supervised with abstract, step-by-step plans that are learned and instructed by the high-level policy. This design decomposes complicated problems into a series of coherent chain-of-thought reasoning sub-tasks, providing flexible temporal abstraction to significantly enhance exploration and learning for long-horizon tasks. Furthermore, GLIDER facilitates fast online adaptation to non-stationary environments owing to the strong transferability of its task-agnostic low-level skills. Experiments on ScienceWorld and ALFWorld benchmarks show that GLIDER achieves consistent performance gains, along with enhanced generalization capabilities.
Lay Summary
Large language models (LLMs) have difficulty handling complex decision-making tasks, especially when feedback is limited. They often get lost in long-term planning and struggle to explore effectively, like a chess player who can't think multiple moves ahead.We developed GLIDER, a framework that breaks down complex tasks into smaller, manageable steps. Like a skilled manager delegating tasks, GLIDER uses a two-level system where high-level planning guides step-by-step execution.This approach helps LLMs tackle challenging tasks more efficiently and adapt to new situations better, showing significant improvements in virtual environments that test reasoning and problem-solving abilities.