Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Hanlin Zhu · Shibo Hao · Zhiting Hu · Jiantao Jiao · Stuart Russell · Yuandong Tian
Abstract
In this paper, we prove that a two-layer transformer with $D$ steps of continuous chain-of-thoughts (CoTs) can solve the directed graph reachability problem, where $D$ is the diameter of the graph, while the best known result of constant-depth transformers with discrete CoTs requires $O(n^2)$ decoding steps where $n$ is the number of vertices ($D
Chat is not available.
Successful Page Load