Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
ICLR 2024(2024)
摘要
Instructing the model to generate a sequence of intermediate steps, a.k.a., a
chain of thought (CoT), is a highly effective method to improve the accuracy of
large language models (LLMs) on arithmetics and symbolic reasoning tasks.
However, the mechanism behind CoT remains unclear. This work provides a
theoretical understanding of the power of CoT for decoder-only transformers
through the lens of expressiveness. Conceptually, CoT empowers the model with
the ability to perform inherently serial computation, which is otherwise
lacking in transformers, especially when depth is low. Given input length n,
previous works have shown that constant-depth transformers with finite
precision 𝗉𝗈𝗅𝗒(n) embedding size can only solve problems in
𝖳𝖢^0 without CoT. We first show an even tighter expressiveness upper
bound for constant-depth transformers with constant-bit precision, which can
only solve problems in 𝖠𝖢^0, a proper subset of 𝖳𝖢^0.
However, with T steps of CoT, constant-depth transformers using constant-bit
precision and O(log n) embedding size can solve any problem solvable by
boolean circuits of size T. Empirically, enabling CoT dramatically improves
the accuracy for tasks that are hard for parallel computation, including the
composition of permutation groups, iterated squaring, and circuit value
problems, especially for low-depth transformers.
更多查看译文
关键词
Chain of thought,language modeling,circuit complexity,deep learning theory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要