Failure-Driven Workflow Refinement
Jusheng Zhang ⋅ Jing Yang ⋅ Kaitong Cai ⋅ Ziliang Chen ⋅ Yongsen Zheng ⋅ Kwok Yan Lam ⋅ Liang Lin ⋅ Keze Wang
Abstract
Workflow optimization for tool-using LLM agents is often cast as global search over candidate graphs, scored by a scalar metric. This collapses rich, multi-step failure traces into binary outcomes, obscuring recurring failure structure and making refinement inefficient. We reframe optimization as \emph{distributional refinement}: each workflow induces a density over a \textbf{Failure Signature Space} $\mathcal{F}$, and the goal is to minimize its \textbf{Expected Failure Mass}. We propose \textbf{CE-Graph}, which maintains a counterexample pool, estimates dense failure modes, and applies operator-constrained graph edits via a \textbf{Propose-and-Verify} loop with a convergence-aware stopping rule. Across math, code, and QA benchmarks, CE-Graph improves robustness while reducing optimization cost compared to strong workflow-search baselines, suggesting reliability emerges from learning and reshaping failure landscapes rather than merely maximizing aggregate success rates.
Successful Page Load