Workshop: ES-FoMo: Efficient Systems for Foundation Models

Looped Transformers are Better at Learning Learning Algorithms

Liu Yang · Kangwook Lee · Robert Nowak · Dimitris Papailiopoulos


Transformers can “learn” to solve data-fitting problems generated by a variety of (latent) models, including linear models, sparse linear models, decision trees, and neural networks, as demonstrated by Garg et al. (2022). These tasks, which fall under well-defined function class learning problems, can be solved using iterative algorithms that involve repeatedly applying the same function to the input potentially an infinite number of times. In this work, we aim to train a transformer to emulate this iterative behavior by utilizing a looped transformer architecture (Giannou et al., 2023). Our experimental results reveal that the looped transformer performs equally well as the unlooped transformer in solving these numerical tasks, while also offering the advantage of having much fewer parameters

Chat is not available.