Timezone: »
Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their advanced capabilities of coherently synthesizing knowledge, pragmatics, and abstract thought. Towards making sense of this fundamentally unsolved problem, this work identifies and analyzes the phenomenon of \emph{attention glitches},in which the Transformer architecture's inductive biases intermittently fail to capture robust reasoning.To isolate the issue, we introduce \emph{flip-flop language modeling} (FFLM), a parametric family of synthetic benchmarks designed to probe the extrapolative behavior of neural language models. This simple generative task requires a model to copy binary symbols over long-range dependencies, ignoring the tokens in between. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques. Our preliminary mechanistic analyses show why the remaining errors may be very difficult to diagnose and resolve. We hypothesize that attention glitches account for (some of) the closed-domain hallucinations in natural LLMs.
Author Information
Bingbin Liu (Carnegie Mellon University)
Jordan Ash (Microsoft Research NYC)
Surbhi Goel (Microsoft Research)
Akshay Krishnamurthy (Microsoft)
Cyril Zhang (Microsoft Research)
More from the Same Authors
-
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2023 : Characterizing and Improving Transformer Solutions for Dyck Grammars »
Kaiyue Wen · Yuchen Li · Bingbin Liu · Andrej Risteski -
2023 : (Un)interpretability of Transformers: a case study with Dyck grammars »
Kaiyue Wen · Yuchen Li · Bingbin Liu · Andrej Risteski -
2023 : (Un)interpretability of Transformers: a case study with Dyck grammars »
Kaiyue Wen · Yuchen Li · Bingbin Liu · Andrej Risteski -
2023 : Exposing Attention Glitches with Flip-Flop Language Modeling »
Bingbin Liu · Jordan Ash · Surbhi Goel · Akshay Krishnamurthy · Cyril Zhang -
2022 Social: Mental Health in ML Academia »
Paula Gradu · Cyril Zhang -
2022 Poster: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Poster: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Spotlight: Sparsity in Partially Controllable Linear Systems »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2022 Spotlight: Understanding Contrastive Learning Requires Incorporating Inductive Biases »
Nikunj Umesh Saunshi · Jordan Ash · Surbhi Goel · Dipendra Kumar Misra · Cyril Zhang · Sanjeev Arora · Sham Kakade · Akshay Krishnamurthy -
2022 Poster: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang -
2022 Spotlight: Inductive Biases and Variable Creation in Self-Attention Mechanisms »
Benjamin Edelman · Surbhi Goel · Sham Kakade · Cyril Zhang -
2021 : Sparsity in the Partially Controllable LQR »
Yonathan Efroni · Sham Kakade · Akshay Krishnamurthy · Cyril Zhang -
2021 Poster: Statistical Estimation from Dependent Data »
Vardis Kandiros · Yuval Dagan · Nishanth Dikkala · Surbhi Goel · Constantinos Daskalakis -
2021 Spotlight: Statistical Estimation from Dependent Data »
Vardis Kandiros · Yuval Dagan · Nishanth Dikkala · Surbhi Goel · Constantinos Daskalakis -
2021 Poster: Acceleration via Fractal Learning Rate Schedules »
Naman Agarwal · Surbhi Goel · Cyril Zhang -
2021 Spotlight: Acceleration via Fractal Learning Rate Schedules »
Naman Agarwal · Surbhi Goel · Cyril Zhang -
2020 Poster: Learning Mixtures of Graphs from Epidemic Cascades »
Jessica Hoffmann · Soumya Basu · Surbhi Goel · Constantine Caramanis -
2020 Poster: Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent »
Surbhi Goel · Aravind Gollakota · Zhihan Jin · Sushrut Karmalkar · Adam Klivans -
2020 Poster: Efficiently Learning Adversarially Robust Halfspaces with Noise »
Omar Montasser · Surbhi Goel · Ilias Diakonikolas · Nati Srebro -
2018 Poster: Learning One Convolutional Layer with Overlapping Patches »
Surbhi Goel · Adam Klivans · Raghu Meka -
2018 Oral: Learning One Convolutional Layer with Overlapping Patches »
Surbhi Goel · Adam Klivans · Raghu Meka