Poster
in
Workshop: Structured Probabilistic Inference and Generative Modeling
Recursive Introspection: Teaching LLM Agents How to Self-Improve
Yuxiao Qu · Tianjun Zhang · Naman Garg · Aviral Kumar
Keywords: [ Self-Improvement ] [ Reinforcement Learning ] [ Large Language Model ]
Abstract:
A central piece in enabling intelligent agentic behavior in foundation models is to make them capable of introspecting upon their behavior, to reason and correct their mistakes. However, powerful proprietary large language models (LLMs) lack the ability to sequentially improve their responses, even when explicitly informed about their mistakes. In this paper, we develop $\textbf{RISE}$: $\textbf{R}$ecursive $\textbf{I}$ntro$\textbf{S}$p$\textbf{E}$ction, an approach for fine-tuning LLMs to introduce this ability. Our approach prescribes an iterative fine-tuning procedure, which attempts to teach the model how to alter its response after having seen previously unsuccessful attempts to solve a problem with additional environment feedback. $\textbf{RISE}$ poses fine-tuning for a single-turn problem as solving a multi-turn Markov decision process (MDP), where the initial state is the prompt. Inspired by principles in online imitation learning, we derive effective strategies to dictate multi-turn data collection and training so as to imbue an LLM with the capability to recursively detect and correct its previous mistakes in subsequent iterations. Our experiments show that $\textbf{RISE}$ enables 7B Llama2 and Mistral models to improve themselves with more turns on math reasoning tasks, outperforming several single-turn strategies given an equal amount of inference-time computation. Our analysis shows that $\textbf{RISE}$ makes meaningful improvements to responses to arrive at the correct solution for challenging prompts, without disrupting one-turn abilities.
Chat is not available.