Timezone: »
We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including lexicographic operations, non-linear functions, function calls, program counters, and conditional branches. Using this framework, we emulate a computer using a simple instruction-set architecture, which allows us to map iterative algorithms to programs that can be executed by a constant depth looped transformer network. We show how a single frozen transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and even a full backpropagation, in-context learning algorithm. Our findings reveal the potential of transformer networks as programmable compute units and offer insight into the mechanics of attention.
Author Information
Angeliki Giannou (University of Wisconsin - Madison)
Shashank Rajput (University of Wisconsin-Madison)
Jy-yong Sohn (Yonsei University)
Kangwook Lee (KAIST)
Jason Lee (Princeton University)
Dimitris Papailiopoulos (University of Wisconsin-Madison)
More from the Same Authors
-
2023 : Teaching Arithmetic to Small Transformers »
Nayoung Lee · Kartik Sreenivasan · Jason Lee · Kangwook Lee · Dimitris Papailiopoulos -
2023 : Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding »
Seongjun Yang · Gibbeum Lee · Jaewoong Cho · Dimitris Papailiopoulos · Kangwook Lee -
2023 : Looped Transformers are Better at Learning Learning Algorithms »
Liu Yang · Kangwook Lee · Robert Nowak · Dimitris Papailiopoulos -
2023 : Scaling In-Context Demonstrations with Structured Attention »
Tianle Cai · Kaixuan Huang · Jason Lee · Mengdi Wang · Danqi Chen -
2023 : Fine-Tuning Language Models with Just Forward Passes »
Sadhika Malladi · Tianyu Gao · Eshaan Nichani · Jason Lee · Danqi Chen · Sanjeev Arora -
2023 : Reward Collapse in Aligning Large Language Models: A Prompt-Aware Approach to Preference Rankings »
Ziang Song · Tianle Cai · Jason Lee · Weijie Su -
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 : A Representer Theorem for Vector-Valued Neural Networks: Insights on Weight Decay Training and Widths of Deep Neural Networks »
Joseph Shenouda · Rahul Parhi · Kangwook Lee · Robert Nowak -
2023 : Provable Offline Reinforcement Learning with Human Feedback »
Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 : How to Query Human Feedback Efficiently in RL? »
Wenhao Zhan · Masatoshi Uehara · Wen Sun · Jason Lee -
2023 : 🎤 Fine-Tuning Language Models with Just Forward Passes »
Sadhika Malladi · Tianyu Gao · Eshaan Nichani · Alex Damian · Jason Lee · Danqi Chen · Sanjeev Arora -
2023 : How to Query Human Feedback Efficiently in RL? »
Wenhao Zhan · Masatoshi Uehara · Wen Sun · Jason Lee -
2023 Poster: Efficient displacement convex optimization with particle gradient descent »
Hadi Daneshmand · Jason Lee · Chi Jin -
2023 Poster: Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning »
Yulai Zhao · Zhuoran Yang · Zhaoran Wang · Jason Lee -
2023 Poster: Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2023 Poster: Transformers as Algorithms: Generalization and Stability in In-context Learning »
Yingcong Li · Muhammed Ildiz · Dimitris Papailiopoulos · Samet Oymak -
2023 Poster: Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing »
Jikai Jin · Zhiyuan Li · Kaifeng Lyu · Simon Du · Jason Lee -
2023 Poster: Improving Fair Training under Correlation Shifts »
Yuji Roh · Kangwook Lee · Steven Whang · Changho Suh -
2023 Poster: Optimizing DDPM Sampling with Shortcut Fine-Tuning »
Ying Fan · Kangwook Lee -
2022 Poster: GenLabel: Mixup Relabeling using Generative Models »
Jy yong Sohn · Liang Shang · Hongxu Chen · Jaekyun Moon · Dimitris Papailiopoulos · Kangwook Lee -
2022 Spotlight: GenLabel: Mixup Relabeling using Generative Models »
Jy yong Sohn · Liang Shang · Hongxu Chen · Jaekyun Moon · Dimitris Papailiopoulos · Kangwook Lee -
2021 : Dreaming of Federated Robustness: Inherent Barriers and Unavoidable Tradeoffs »
Dimitris Papailiopoulos -
2020 Poster: Closing the convergence gap of SGD without replacement »
Shashank Rajput · Anant Gupta · Dimitris Papailiopoulos -
2019 Workshop: Coding Theory For Large-scale Machine Learning »
Viveck Cadambe · Pulkit Grover · Dimitris Papailiopoulos · Gauri Joshi -
2019 Poster: Does Data Augmentation Lead to Positive Margin? »
Shashank Rajput · Zhili Feng · Zachary Charles · Po-Ling Loh · Dimitris Papailiopoulos -
2019 Oral: Does Data Augmentation Lead to Positive Margin? »
Shashank Rajput · Zhili Feng · Zachary Charles · Po-Ling Loh · Dimitris Papailiopoulos -
2018 Poster: DRACO: Byzantine-resilient Distributed Training via Redundant Gradients »
Lingjiao Chen · Hongyi Wang · Zachary Charles · Dimitris Papailiopoulos -
2018 Oral: DRACO: Byzantine-resilient Distributed Training via Redundant Gradients »
Lingjiao Chen · Hongyi Wang · Zachary Charles · Dimitris Papailiopoulos -
2018 Poster: Stability and Generalization of Learning Algorithms that Converge to Global Optima »
Zachary Charles · Dimitris Papailiopoulos -
2018 Oral: Stability and Generalization of Learning Algorithms that Converge to Global Optima »
Zachary Charles · Dimitris Papailiopoulos