Scaling Long-Horizon Agent via Context Folding
Abstract
Large language model (LLM) agents are fundamentally constrained by context length on long-horizon tasks. Existing agent frameworks usually rely on manually defined context engineering pipelines, such as multi-agent or post-hoc summary. We introduce Context Folding, a framework that empowers agents to actively manage their working context. An agent can procedurally branch into a sub-trajectory to handle a subtask and then fold it upon completion, collapsing the intermediate steps while retaining a concise summary of the outcome. To make this behavior learnable, we propose FoldPO, an end-to-end reinforcement learning framework with specific process rewards to encourage effective task decomposition and context management. On complex long-horizon tasks, our agent matches the performance of baselines while using an active context up to 10x smaller, and significantly outperforms models constrained to the same context size.