Poster Wed, Jul 8, 2026 • 1:00 AM – 2:45 AM PDT HALL A #4014

SemRep : Generative Code Representation Learning with Code Transformations

Weichen Li ⋅ Jiamin Song ⋅ Bogdan Stoica ⋅ Arav Dhoot ⋅ Gabriel Ryan ⋅ Shengyu Fu ⋅ Kexin Pei

Abstract

Code transformation is a foundational capability in the software development process, where its effectiveness relies on constructing a high-quality code representation to characterize the input code semantics and guide the transformation. Existing approaches treat code transformation as an end-to-end learning task, leaving the construction of the representation needed for semantic reasoning implicit in model weights or relying on rigid compiler-level abstractions. We present SemRep, a framework that improves code transformation through *generative code representation learning*. Our key insight is to employ the semantics-preserving transformations as the intermediate representation, which serves as both a generative mid-training task and the guidance for subsequent instruction-specific code transformations. Across general code editing and optimization tasks (e.g., GPU kernel optimization), SemRep outperforms the extensively finetuned baselines with strictly the same training budget by 6.9\% in correctness, 1.1$\times$ in performance, 13.9\% in generalization, and 6.7\% in robustness. With the improved exploration of diverse code transformations, SemRep is particularly amenable to evolutionary search. Combined with an evolutionary coding agent, SemRep finds optimizations that 685B larger-weight baselines fail to discover while achieving the same performance with 25\% less inference compute.

Lay Summary

Code transformation, e.g., editing code to fix bugs, add features, or improve performance, is a core task in software engineering. Current large language models attempt these edits end-to-end, jumping directly from input code to the desired output. This often leads to errors because the model never explicitly reasons about what the original code does before changing it. SemRep addresses this by disentangling semantic understanding from task-specific code editing. The model is first trained to generate semantically equivalent rewrites of the input code — programs that look different but behave identically, verified through test execution. This serves as a form of generative representation learning, forcing the model to internalize code semantics as explicit, human-readable intermediate programs rather than latent weight parameters. The model is then finetuned for the specific editing task. Crucially, both training phases share a fixed total budget, so the approach introduces no additional training cost over standard finetuning. During inference, SemRep allows the model to explore semantically equivalent variants of the input before applying the requested transformation. This naturally supports test-time scaling through evolutionary search, where diverse equivalent rewrites serve as stepping stones toward better solutions. Experiments on GPU kernel optimization (KernelBench) and real-world code editing (EditBench) show that SemRep enables a 32B model to match or exceed commercial systems and open-weight models up to 12× larger. It improves correctness by up to 43% over the previous state of the art, generalizes 13.9% better to unseen hardware, and is 6.7% more robust to surface-level code perturbations. When integrated with an evolutionary coding agent, SemRep discovers optimizations that 685B-parameter models fail to find while using 25% less inference compute.