DECOR: Learning to Decompose and Collaborate in Deep Search via Multi-Agent Reinforcement Learning
Ruiqing Chen ⋅ Zekun Zhang ⋅ Gong-Duo Zhang ⋅ Lihong Gu ⋅ Lin Zhou
Abstract
Monolithic agents in deep search often suffer from "cognitive overload," while existing multi-agent approaches mostly rely on frozen models that cannot learn from collaboration failures. To bridge this gap, we propose $\textbf{DECOR}$ ($\textbf{DE}$compose and $\textbf{CO}$llaborate via $\textbf{R}$ole-specialized agents), a framework formulating deep search as a Multi-Agent Reinforcement Learning (MARL) problem. DECOR functionally decomposes the task into three specialized roles: a $\textit{Planner}$ to navigate, a $\textit{Filter}$ to curate a noise-reduced memory, and an $\textit{Answerer}$ for synthesis. Unlike training-free orchestration, we jointly optimize these agents using a hybrid reward strategy that harmonizes role-specific intrinsic feedback with team-level outcome signals. Experiments on seven benchmarks show that DECOR significantly outperforms strong monolithic baselines, demonstrating the necessity of learning-based functional decomposition in handling cognitive overload.
Successful Page Load