Poster
in
Workshop: Knowledge and Logical Reasoning in the Era of Data-driven Learning
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Zhiyuan Li · Hong Liu · Denny Zhou · Tengyu Ma
Abstract:
Generating a sequence of intermediate steps, \emph{a.k.a.}, a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Formally, given input length $n$, we show that constant-depth transformers can solve $\mathsf{NC}^1$-complete problems like wording problem of $S_5$ provided with $O(n)$ steps of CoT and $\mathsf{poly}(n)$ embedding size. We further show constant-depth transformers can solve any problem in $\mathsf{P/poly}$ provided with $O(\mathsf{poly}(n))$ steps of CoT and $O(\log(n))$ embedding size. In contrast, it is shown (Liu et al., 2022) that constant-depth transformers without CoT can only solve problems in $\mathsf{TC}^0$. Under the unproven but widely believed assumption that $\mathsf{TC}^0\subsetneq \mathsf{NC}^1 \subsetneq \mathsf{Ppoly}$, allowing a longer chain of thought fundamentally increases the expressiveness of transformers. Empirically, enabling CoT dramatically improves the accuracy for tasks that are hard for parallel computation, including the composition of permutation groups, iterated squaring, and circuit value problems, especially for low-depth transformers.
Chat is not available.