ICML Poster Exploiting Code Symmetries for Learning Program Semantics

Spotlight Poster

Exploiting Code Symmetries for Learning Program Semantics

Kexin Pei · Weichen Li · Qirui Jin · Shuyang Liu · Scott Geng · Lorenzo Cavallaro · Junfeng Yang · Suman Jana

Hall C 4-9 #1000

[ Abstract ] [ Paper PDF ]

[ Slides] [ Poster]

Abstract:

This paper tackles the challenge of teaching code semantics to Large Language Models (LLMs) for program analysis by incorporating code symmetries into the model architecture. We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. Our solution, SymC, develops a novel variant of self-attention that is provably equivariant to code symmetries from the permutation group defined over the program dependence graph. SymC obtains superior performance on five program analysis tasks, outperforming state-of-the-art code models, including GPT-4, without any pre-training. Our results suggest that code LLMs that encode the code structural prior via the code symmetry group generalize better and faster.

Chat is not available.