ICML Efficient Linear System Solver with Transformers

Poster
in
Workshop: AI for Math Workshop

Efficient Linear System Solver with Transformers

Max Vladymyrov · Johannes Von Oswald · Nolan Miller · Mark Sandler

[ Abstract ] [ Project Page ]

[ Slides] [ OpenReview]

Abstract:

This paper investigates the potential of linear Transformers as solvers for systems of linear equations. We propose a novel approach where the Transformer encodes each equation as a separate token, allowing the model to process the system in a permutation-invariant manner. To enhance generalizability and reduce the parameter count, we introduce a block-wise re-parameterization technique for the attention weight matrices. This technique decouples the problem dimension from the model's parameter count, enabling the Transformer to effectively handle systems of varying sizes. Our experiments demonstrate the Transformer's competitive performance compared to established classical methods such as Conjugate Gradient, especially for systems with smaller sizes. We further explore the model's ability to extrapolate to larger systems, providing evidence for its potential as a versatile and efficient solver for linear equations.

Chat is not available.

Poster in Workshop: AI for Math Workshop

Efficient Linear System Solver with Transformers

Max Vladymyrov · Johannes Von Oswald · Nolan Miller · Mark Sandler

Poster
in
Workshop: AI for Math Workshop