Poster Thu, Jul 9, 2026 • 2:30 PM – 4:15 PM KST Coex: HALL A

LEGO: An LLM-Enabled Hierarchical Optimizer for Tensor Computation Graphs with Structure-Aware Search and Compositional Synthesis

Ruiyuan Xu ⋅ shuoming zhang ⋅ Guangli Li ⋅ Qiuchu Yu ⋅ Rui Zhang ⋅ Yangyu Zhang ⋅ Hao Qian ⋅ Chunwei Xia ⋅ Jiacheng Zhao ⋅ Chenxi Wang ⋅ Xiaobing Feng ⋅ Jingling Xue ⋅ Huimin Cui

Abstract

Automating end-to-end GPU kernel generation with Large Language Models (LLMs) faces a critical tension between global performance and exploration efficiency. We present LEGO, a hierarchical framework that resolves this trade-off via a parallel multi-agent search over a recursive AND-OR FusionTree. LEGO synergizes two complementary flows: Top-Down Construction decomposes complex graphs into valid, context-isolated sub-problems to guarantee correctness and enable parallel exploration, while Bottom-Up Mutation speculatively fuses verified sub-plans to recover global locality for peak performance. This bi-directional mechanism effectively prunes the search space to avoid repetitive unguided sampling, while naturally parallelizing exploration, and enabling the discovery of sophisticated fusion strategies. Evaluations demonstrate that LEGO achieves 2.18x–13.48x speedups over PyTorch Eager and reduces end-to-end exploration time by up to 2.47x (with 7x token reduction) compared to monolithic baselines across diverse end-to-end models.