ICML Exposing Attention Glitches with Flip-Flop Language Modeling

Poster
in
Workshop: Challenges in Deployable Generative AI

Exposing Attention Glitches with Flip-Flop Language Modeling

Bingbin Liu · Jordan Ash · Surbhi Goel · Akshay Krishnamurthy · Cyril Zhang

Keywords: [ hallucinations ] [ transformers ] [ Language Models ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Why do large language models hallucinate? This work identifies and analyzes the phenomenon of \emph{attention glitches},in which the Transformer architecture's inductive biases intermittently fail to capture robust reasoning.To isolate the issue, we introduce \emph{flip-flop language modeling} (FFLM), a parametric family of synthetic benchmarks designed to probe the extrapolation of language models. This simple generative task requires a model to copy binary symbols over long-range dependencies, ignoring the tokens in between. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques. Our preliminary mechanistic analyses show why the remaining errors may be very difficult to diagnose and resolve. We hypothesize that attention glitches account for (some of) the closed-domain hallucinations in natural LLMs.

Chat is not available.

Poster in Workshop: Challenges in Deployable Generative AI

Exposing Attention Glitches with Flip-Flop Language Modeling

Bingbin Liu · Jordan Ash · Surbhi Goel · Akshay Krishnamurthy · Cyril Zhang

Poster
in
Workshop: Challenges in Deployable Generative AI