Poster
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO

Jaeha Lee · Gio Huh · Ning Su · Tony YU

Keywords: reinforcement learning transformer model functional decomposition polynomial decomposition symbolic reasoning beam search

Project Page [ Poster] [ OpenReview]

Abstract

We study the capabilities of small-scale transformer models in symbolic reasoning, focusing on the NP-hard algebraic task of multivariate polynomial decomposition, with widespread applications in science and engineering. Our approach includes a fine-grained synthetic data generation pipeline, supervised pretraining, beam search, evaluations for scaling behavior and generalizability, and a novel rank-aware reinforcement learning method called Beam Grouped Relative Policy Optimization (BGRPO), which improves accuracy while reducing inference compute by up to 75%. Additionally, our model demonstrates competitive performance in polynomial simplification, outperforming Mathematica in various cases.

Chat is not available.