On the Convergence of Zeroth-Order Federated Tuning for Large Language Models
CoRR(2024)
摘要
The confluence of Federated Learning (FL) and Large Language Models (LLMs) is
ushering in a new era in privacy-preserving natural language processing.
However, the intensive memory requirements for fine-tuning LLMs pose
significant challenges, especially when deploying on clients with limited
computational resources. To circumvent this, we explore the novel integration
of Memory-efficient Zeroth-Order Optimization within a federated setting, a
synergy we term as FedMeZO. Our study is the first to examine the theoretical
underpinnings of FedMeZO in the context of LLMs, tackling key questions
regarding the influence of large parameter spaces on optimization behavior, the
establishment of convergence properties, and the identification of critical
parameters for convergence to inform personalized federated strategies. Our
extensive empirical evidence supports the theory, showing that FedMeZO not only
converges faster than traditional first-order methods such as FedAvg but also
significantly reduces GPU memory usage during training to levels comparable to
those during inference. Moreover, the proposed personalized FL strategy that is
built upon the theoretical insights to customize the client-wise learning rate
can effectively accelerate loss reduction. We hope our work can help to bridge
theoretical and practical aspects of federated fine-tuning for LLMs, thereby
stimulating further advancements and research in this area.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要