Pessimistic Verification for Open-Ended Math Questions
Abstract
Automatic verification is a critical component in building math-solving agents and reinforcement learning, yet it often falls short in generalizability, performance, and cost-efficiency. Identifying that the primary bottleneck of verification lies in error detection capability, we propose pessimistic verification, a paradigm of agentic workflows that rejects a solution if any of multiple parallel verifiers identifies a flaw. We further introduce progressive pessimistic verification, which employs fine-grained proof decomposition to significantly enhance verification accuracy and efficiency. Our approach surpasses the performance and token efficiency of extended long chain-of-thought (long CoT) and mainstream verification workflows, crucially, our analysis reveals that existing benchmarks underestimate its effectiveness on stronger models due to inherent annotation errors. To further validate the effectiveness of our method, we applied a verification-based solving workflow on the IMO 2025 and MathArena Apex 2025 datasets, where the workflow with progressive pessimistic verification exhibits remarkable improvements in both efficiency and accuracy on highly challenging contest-level math problems with state-of-the-art models.