Timezone: »
Transformers can “learn” to solve data-fitting problems generated by a variety of (latent) models, including linear models, sparse linear models, decision trees, and neural networks, as demonstrated by Garg et al. (2022). These tasks, which fall under well-defined function class learning problems, can be solved using iterative algorithms that involve repeatedly applying the same function to the input potentially an infinite number of times. In this work, we aim to train a transformer to emulate this iterative behavior by utilizing a looped transformer architecture (Giannou et al., 2023). Our experimental results reveal that the looped transformer performs equally well as the unlooped transformer in solving these numerical tasks, while also offering the advantage of having much fewer parameters
Author Information
Liu Yang (University of Wisconsin - Madison)
Kangwook Lee (KAIST)
Robert Nowak (University of Wisconsion-Madison)

Robert Nowak holds the Nosbusch Professorship in Engineering at the University of Wisconsin-Madison, where his research focuses on signal processing, machine learning, optimization, and statistics.
Dimitris Papailiopoulos (University of Wisconsin-Madison)
More from the Same Authors
-
2021 : On the Sparsity of Deep Neural Networks in the Overparameterized Regime: An Empirical Study »
Rahul Parhi · Jack Wolf · Robert Nowak -
2023 : Algorithm Selection for Deep Active Learning with Imbalanced Datasets »
Jifan Zhang · Shuai Shao · Saurabh Verma · Robert Nowak -
2023 : LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning »
Jifan Zhang · Yifang Chen · Gregory Canal · Stephen Mussmann · Yinglun Zhu · Simon Du · Kevin Jamieson · Robert Nowak -
2023 : Teaching Arithmetic to Small Transformers »
Nayoung Lee · Kartik Sreenivasan · Jason Lee · Kangwook Lee · Dimitris Papailiopoulos -
2023 : Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding »
Seongjun Yang · Gibbeum Lee · Jaewoong Cho · Dimitris Papailiopoulos · Kangwook Lee -
2023 : SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits »
Subhojyoti Mukherjee · Qiaomin Xie · Josiah Hanna · Robert Nowak -
2023 : A Representer Theorem for Vector-Valued Neural Networks: Insights on Weight Decay Training and Widths of Deep Neural Networks »
Joseph Shenouda · Rahul Parhi · Kangwook Lee · Robert Nowak -
2023 Oral: A Fully First-Order Method for Stochastic Bilevel Optimization »
Jeongyeol Kwon · Dohyun Kwon · Stephen Wright · Robert Nowak -
2023 Poster: A Fully First-Order Method for Stochastic Bilevel Optimization »
Jeongyeol Kwon · Dohyun Kwon · Stephen Wright · Robert Nowak -
2023 Poster: Looped Transformers as Programmable Computers »
Angeliki Giannou · Shashank Rajput · Jy-yong Sohn · Kangwook Lee · Jason Lee · Dimitris Papailiopoulos -
2023 Poster: Transformers as Algorithms: Generalization and Stability in In-context Learning »
Yingcong Li · Muhammed Ildiz · Dimitris Papailiopoulos · Samet Oymak -
2023 Poster: Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection »
Haoyue Bai · Gregory Canal · Xuefeng Du · Jeongyeol Kwon · Robert Nowak · Sharon Li -
2023 Poster: Improving Fair Training under Correlation Shifts »
Yuji Roh · Kangwook Lee · Steven Whang · Changho Suh -
2023 Poster: Optimizing DDPM Sampling with Shortcut Fine-Tuning »
Ying Fan · Kangwook Lee -
2022 Poster: GALAXY: Graph-based Active Learning at the Extreme »
Jifan Zhang · Julian Katz-Samuels · Robert Nowak -
2022 Spotlight: GALAXY: Graph-based Active Learning at the Extreme »
Jifan Zhang · Julian Katz-Samuels · Robert Nowak -
2022 Poster: GenLabel: Mixup Relabeling using Generative Models »
Jy yong Sohn · Liang Shang · Hongxu Chen · Jaekyun Moon · Dimitris Papailiopoulos · Kangwook Lee -
2022 Poster: Training OOD Detectors in their Natural Habitats »
Julian Katz-Samuels · Julia Nakhleh · Robert Nowak · Sharon Li -
2022 Spotlight: Training OOD Detectors in their Natural Habitats »
Julian Katz-Samuels · Julia Nakhleh · Robert Nowak · Sharon Li -
2022 Spotlight: GenLabel: Mixup Relabeling using Generative Models »
Jy yong Sohn · Liang Shang · Hongxu Chen · Jaekyun Moon · Dimitris Papailiopoulos · Kangwook Lee -
2021 : Dreaming of Federated Robustness: Inherent Barriers and Unavoidable Tradeoffs »
Dimitris Papailiopoulos -
2020 Poster: Robust Outlier Arm Identification »
Yinglun Zhu · Sumeet Katariya · Robert Nowak -
2020 Poster: Closing the convergence gap of SGD without replacement »
Shashank Rajput · Anant Gupta · Dimitris Papailiopoulos -
2019 Workshop: Coding Theory For Large-scale Machine Learning »
Viveck Cadambe · Pulkit Grover · Dimitris Papailiopoulos · Gauri Joshi -
2019 Poster: Does Data Augmentation Lead to Positive Margin? »
Shashank Rajput · Zhili Feng · Zachary Charles · Po-Ling Loh · Dimitris Papailiopoulos -
2019 Poster: Bilinear Bandits with Low-rank Structure »
Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak -
2019 Oral: Bilinear Bandits with Low-rank Structure »
Kwang-Sung Jun · Rebecca Willett · Stephen Wright · Robert Nowak -
2019 Oral: Does Data Augmentation Lead to Positive Margin? »
Shashank Rajput · Zhili Feng · Zachary Charles · Po-Ling Loh · Dimitris Papailiopoulos -
2019 Tutorial: Active Learning: From Theory to Practice »
Robert Nowak · Steve Hanneke -
2018 Poster: DRACO: Byzantine-resilient Distributed Training via Redundant Gradients »
Lingjiao Chen · Hongyi Wang · Zachary Charles · Dimitris Papailiopoulos -
2018 Oral: DRACO: Byzantine-resilient Distributed Training via Redundant Gradients »
Lingjiao Chen · Hongyi Wang · Zachary Charles · Dimitris Papailiopoulos -
2018 Poster: Stability and Generalization of Learning Algorithms that Converge to Global Optima »
Zachary Charles · Dimitris Papailiopoulos -
2018 Oral: Stability and Generalization of Learning Algorithms that Converge to Global Optima »
Zachary Charles · Dimitris Papailiopoulos -
2017 Poster: Algebraic Variety Models for High-Rank Matrix Completion »
Greg Ongie · Laura Balzano · Rebecca Willett · Robert Nowak -
2017 Talk: Algebraic Variety Models for High-Rank Matrix Completion »
Greg Ongie · Laura Balzano · Rebecca Willett · Robert Nowak