ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution
Liu Yang ⋅ Zeyu Nie ⋅ Andrew Liu ⋅ Ruomu Zou ⋅ Deniz Altınbüken ⋅ Amir Yazdanbakhsh ⋅ Quanquan Liu
Abstract
The transition from sequential to parallel computing is essential for modern high-performance applications but is hindered by the steep learning curve of concurrent programming. This challenge is magnified for \textbf{irregular data structures} (such as sparse graphs, unbalanced trees, and non-uniform meshes) where static scheduling fails and data dependencies are unpredictable. Current Large Language Models (LLMs) often fail catastrophically on these tasks, generating code plagued by subtle race conditions, deadlocks, and sub-optimal scaling. We bridge this gap with \textbf{\sys}, a framework designed to synthesize high-performance parallel algorithms for irregular data. Our contributions include: (1) \textbf{The Parlay-Instruct Corpus}, a curated dataset of 12,000 tasks synthesized via a "Critic-Refine" pipeline that explicitly filters for theoretically optimal algorithms under the Work-Span cost model; (2) specialized \textbf{DeepSeek}, \textbf{Qwen}, and \textbf{Gemini} models fine-tuned to align probabilistic generation with the rigorous semantics of the ParlayLib intermediate representation; and (3) an \textbf{Evolutionary Coding Agent (ECA)} that solves the ``last mile'' of correctness by iteratively repairing code using feedback from compilers, race detectors, and performance profilers. On the ParEval benchmark, \sys achieves a \textbf{$106\times$ speedup} on complex irregular graph problems, significantly outperforming state-of-the-art commercial models like GPT-5.2 and Gemini 3 Pro. Furthermore, our approach surpasses expert \emph{human-written} baselines in the standard PBBSBench by \textbf{$4\times$}, demonstrating that AI-driven agents can effectively navigate the complex landscape of high-performance computing.
Successful Page Load