On Expressive Power of Floating-Point Transformers
Abstract
Research on the expressive power of transformers shows that transformers are equivariant to permutations and can approximate all permutation-equivariant continuous functions on a compact domain. However, these results assume real parameters and exact operations, whereas real-world implementations on computers can only use a finite set of numbers and inexact machine operations with round-off errors. In this work, we investigate the representability of floating-point transformers that use floating-point parameters and floating-point operations. Unlike existing results under exact arithmetic, we first show that floating-point transformers can represent non-permutation-equivariant functions even without positional encoding. Furthermore, we prove that floating-point transformers can represent all permutation-equivariant functions when the sequence length is bounded, but they cannot when the sequence length is large. We also identify the minimal equivariance property in floating-point transformers, and show that all non-trivial additive positional encoding can harm the representability of floating-point transformers.