Poster Mon, Jul 6, 2026 • 10:00 PM – 11:45 PM PDT HALL A #4113

TritonGym: A Benchmark for Agentic LLM Workflows in Triton GPU Code Generation

Yue Guan ⋅ Yichen Lin ⋅ Xu Zhao ⋅ Jianzhu Yao ⋅ Xinwei Qiang ⋅ zhongkai yu ⋅ Pramod Viswanath ⋅ Yufei Ding ⋅ Adnan Aziz

Project Page

Abstract

Large language models (LLMs) can already draft plausible Triton kernels, yet most existing evaluations still focus on single-shot generation and underplay tool use and feedback. We introduce TritonGym, a benchmark and orchestration framework for evaluating agentic workflows in GPU code generation. TritonGym standardizes access to a set of code generation tools via function-calls, separating intrinsic model capability from workflow design and enabling fair, apples-to-apples comparison. The benchmark spans a maintained operator set, community samples, out-of-distribution tasks, and DSL extensions, ensuring both generality and extensibility. By providing a common orchestration and evaluation framework, TritonGym democratizes the development of GPU coding agents, supports practical adoption of agent-generated kernels, and facilitates progress on advanced agentic systems.