Poster Tue, Jul 7, 2026 • 10:30 AM – 12:15 PM KST Coex: HALL A

EVOLVING ROLLOUTS: Harnessing Historical Experience for Web Agent Evolution in Reinforcement Learning

Sinuo Wang ⋅ WANG PIAOHONG ⋅ Tianrui Qin ⋅ Maojia Song ⋅ Qianben Chen ⋅ Qiexiang Wang ⋅ Gengze Zhou ⋅ Zeyu Zhang ⋅ He Zhu ⋅ Dingfeng Shi ⋅ Yutong Xie ⋅ Liam Liu ⋅ Jiaheng Liu ⋅ Ge Zhang ⋅ Jiawei Ma ⋅ Yuchen Jiang ⋅ Qi Wu ⋅ Wangchunshu Zhou

Abstract

Agentic reinforcement learning (RL) for web search is prohibitively expensive due to long context lengths and costly environment interactions, and this inefficiency is further exacerbated by GRPO-based optimization, which discards learning signals from entire rollout groups with zero reward variance. In this work, we propose EVOLVING ROLLOUTS, an RL framework for web-search agents that moves beyond episodic training and distills collected rollouts into in-context guidance for future policy behavior. By extracting the reward-labeled trajectories into strategic experiences, our method augments standard parameter-space optimization with implicit context-space optimization guided by prior experience. This enables the agent to recover learning signals from zero-variance rollouts, thereby fostering co-evolution between the policy and the experience repository. EVOLVING ROLLOUTS improves sample efficiency and task performance across representative web search benchmarks, enabling Qwen3-4B models to achieve performance comparable to that of the substantially larger Qwen3-30B-A3B model on GAIA, Xbench, and HLE. We open-source our training framework to support reproducibility and future research.