Poster Mon, Jul 6, 2026 • 6:30 PM – 8:15 PM PDT HALL A #3408

Scaling Long-Horizon Agent via Context Folding

Weiwei Sun ⋅ Lu Miao ⋅ Zhan Ling ⋅ Kang Liu ⋅ Xuesong Yao ⋅ Yiming Yang ⋅ Jiecao Chen

Project Page

Abstract

Large language model (LLM) agents are fundamentally constrained by context length on long-horizon tasks. Existing agent frameworks usually rely on manually defined context engineering pipelines, such as multi-agent or post-hoc summary. We introduce Context Folding, a framework that empowers agents to actively manage their working context. An agent can procedurally branch into a sub-trajectory to handle a subtask and then fold it upon completion, collapsing the intermediate steps while retaining a concise summary of the outcome. To make this behavior learnable, we propose FoldGRPO, an end-to-end reinforcement learning framework with specific process rewards to encourage effective task decomposition and context management. On long-horizon tasks, our agent matches the performance of baselines while using an active context up to 10x smaller, and significantly outperforms models constrained to the same context size. Code is available at https://github.com/sunnweiwei/FoldAgent.

Lay Summary

AI assistants are increasingly used for long tasks, such as researching a question online or fixing software bugs, but they face a simple limitation: they can only keep a limited amount of previous work in view at once. As a task gets longer, the assistant may become slower, more expensive to run, and more likely to lose track of important information. We introduce Context Folding, a way for an assistant to split off temporary side tasks, such as checking a source or exploring part of a codebase, and then return with only the key findings. This is similar to taking concise notes after finishing a subtask, rather than carrying every intermediate step forward. We also develop a training method that teaches the assistant when to create these side tasks and how to summarize them effectively. In experiments on long research and software-engineering tasks, our method solves problems as well as, or better than, strong baselines while keeping the active working context much smaller. This suggests a practical path toward more efficient and reliable AI assistants for complex, multi-step work.