Poster
in
Workshop: ICML 2024 Workshop on Foundation Models in the Wild
Language Model-In-The-Loop: Data Optimal Approach to Recommend Actions in Text Games
Arjun V SS · Prasanna Parthasarathi · Janarthanan Rajendran · Sarath Chandar
Keywords: [ Text-Based Games ] [ Reinforcement Learning ] [ Natural Language Processing ]
Large Language Models (LLMs) have demonstrated superior performance in language understanding benchmarks. A recent use case for LLMs involves training decision-making agents over textual information. The existing approach leverages LLM's linguistic priors for action candidate recommendations in text games, i.e., to operate without environment-provided actions. However, adapting LLMs to specific games/tasks requires a massive amount of annotated human gameplay. Moreover, in the existing approach, the language model was kept frozen during an agent's training process, which limits learning from in-game knowledge about the world. Hence, we explore strategies to adapt the language model for candidate recommendation with in-game transition in an online learning fashion to mitigate reliance on human-annotated gameplays, which are costly to acquire. In this paper, we propose in-game transition selection methods to adapt the LLM in the loop, reducing the dependency on using human-annotated gameplays while improving performance and convergence. Our method demonstrates a 53% relative improvement in average game score over the previous state-of-the-art model, achieving more than twice the convergence rate in a full-annotated dataset setting. Furthermore, even with only 10% of human annotation, we surpassed the 100\% state-of-the-art performance benchmark.