Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO
Jaeha Lee · Gio Huh · Ning Su · Tony YU
Keywords:
reinforcement learning
transformer model
functional decomposition
polynomial decomposition
symbolic reasoning
beam search
Abstract
We study the capabilities of small-scale transformer models in symbolic reasoning, focusing on the NP-hard algebraic task of multivariate polynomial decomposition, with widespread applications in science and engineering. Our approach includes a fine-grained synthetic data generation pipeline, supervised pretraining, beam search, evaluations for scaling behavior and generalizability, and a novel rank-aware reinforcement learning method called Beam Grouped Relative Policy Optimization (BGRPO), which improves accuracy while reducing inference compute by up to 75%. Additionally, our model demonstrates competitive performance in polynomial simplification, outperforming Mathematica in various cases.
Chat is not available.
Successful Page Load