Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference

Alessandro Sordoni,Xingdi Yuan,Marc-Alexandre Côté, Matheus Pereira,Adam Trischler,Ziang Xiao,Arian Hosseini, Friederike Niedtner,Nicolas Le Roux

CoRR（2023）

引用 4|浏览109

暂无评分

摘要

We view large language models (LLMs) as stochastic \emph{language layers} in a network, where the learnable parameters are the natural language \emph{prompts} at each layer. We stack two such layers, feeding the output of one layer to the next. We call the stacked architecture a \emph{Deep Language Network} (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1). We then show how to train 2-layer DLNs (DLN-2), where two prompts must be learnt. We consider the output of the first layer as a latent variable to marginalize, and devise a variational inference algorithm for joint prompt training. A DLN-2 reaches higher performance than a single layer, sometimes comparable to few-shot GPT-4 even when each LLM in the network is smaller and less powerful. The DLN code is open source: https://github.com/microsoft/deep-language-networks .

查看译文

关键词

deep language networks,stacked llms,joint prompt training

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要