Poster
in
Workshop: ES-FoMo: Efficient Systems for Foundation Models
Looped Transformers are Better at Learning Learning Algorithms
Liu Yang · Kangwook Lee · Robert Nowak · Dimitris Papailiopoulos
Transformers can “learn” to solve data-fitting problems generated by a variety of (latent) models, including linear models, sparse linear models, decision trees, and neural networks, as demonstrated by Garg et al. (2022). These tasks, which fall under well-defined function class learning problems, can be solved using iterative algorithms that involve repeatedly applying the same function to the input potentially an infinite number of times. In this work, we aim to train a transformer to emulate this iterative behavior by utilizing a looped transformer architecture (Giannou et al., 2023). Our experimental results reveal that the looped transformer performs equally well as the unlooped transformer in solving these numerical tasks, while also offering the advantage of having much fewer parameters