How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning
CoRR(2024)
摘要
Despite superior reasoning prowess demonstrated by Large Language Models
(LLMs) with Chain-of-Thought (CoT) prompting, a lack of understanding prevails
around the internal mechanisms of the models that facilitate CoT generation.
This work investigates the neural sub-structures within LLMs that manifest CoT
reasoning from a mechanistic point of view. From an analysis of LLaMA-2 7B
applied to multistep reasoning over fictional ontologies, we demonstrate that
LLMs deploy multiple parallel pathways of answer generation for step-by-step
reasoning. These parallel pathways provide sequential answers from the input
question context as well as the generated CoT. We observe a striking functional
rift in the middle layers of the LLM. Token representations in the initial half
remain strongly biased towards the pretraining prior, with the in-context
taking over abruptly in the later half. This internal phase shift manifests in
different functional components: attention heads that write the answer token
predominantly appear in the later half, attention heads that move information
along ontological relationships appear exclusively in the initial half, and so
on. To the best of our knowledge, this is the first attempt towards mechanistic
investigation of CoT reasoning in LLMs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要