Poster
in
Workshop: Automated Reinforcement Learning: Exploring Meta-Learning, AutoML, and LLMs
Scalable and Provable Exploration via HyperAgent for Foundation Model Decision-making
Yingru Li · Jiawei Xu · Zhi-Quan Luo
Abstract:
Foundation models pretrained on diverse datasets excel in various tasks but face challenges in real-world applications, particularly in sequential decision-making under uncertainty. Integrating these models with reinforcement learning exacerbates scalability issues due to enlarged state-action spaces and interactive data volumes, necessitating bounded per-step computational complexity and data efficiency. This work introduces GPT-HyperAgent, synergizing foundation models with HyperAgent---an approximate Thompson sampling (TS) algorithm---for decision-making tasks, focusing on contextual bandits with natural language input. Theoretically, we establish a general regret analysis framework, leading to a distribution-dependent bound. This explains why HyperAgent usually perform better than its special case---ensemble sampling. Under linear bandits setups, HyperAgent's regret matches the exact TS with (\tilde{O}(\log T)) per-step computational complexity where $T$ is the total interactions, closing the gap in the theory for scalable randomized exploration. Practically, we show that perturbation and update distributions in HyperAgent can be chosen separately, providing computational benefits. Empirical results in an online content moderation task demonstrate GPT-HyperAgent's superior exploration capability. This work presents the first benchmark for scalable and efficient foundation model decision-making.
Chat is not available.