Poster
in
Workshop: AI for Math Workshop
Efficient Linear System Solver with Transformers
Max Vladymyrov · Johannes Von Oswald · Nolan Miller · Mark Sandler
This paper investigates the potential of linear Transformers as solvers for systems of linear equations. We propose a novel approach where the Transformer encodes each equation as a separate token, allowing the model to process the system in a permutation-invariant manner. To enhance generalizability and reduce the parameter count, we introduce a block-wise re-parameterization technique for the attention weight matrices. This technique decouples the problem dimension from the model's parameter count, enabling the Transformer to effectively handle systems of varying sizes. Our experiments demonstrate the Transformer's competitive performance compared to established classical methods such as Conjugate Gradient, especially for systems with smaller sizes. We further explore the model's ability to extrapolate to larger systems, providing evidence for its potential as a versatile and efficient solver for linear equations.