Selecting Large Language Model to Fine-tune via Rectified Scaling Law
CoRR(2024)
摘要
The ever-growing ecosystem of LLMs has posed a challenge in selecting the
most appropriate pre-trained model to fine-tune amidst a sea of options. Given
constrained resources, fine-tuning all models and making selections afterward
is unrealistic. In this work, we formulate this resource-constrained selection
task into predicting fine-tuning performance and illustrate its natural
connection with scaling laws. Unlike pre-training, We find that the fine-tuning
scaling curve includes not just the well-known "power phase" but also the
previously unobserved "pre-power phase". We also explain why existing scaling
laws fail to capture this phase transition phenomenon both theoretically and
empirically. To address this, we introduce the concept of "pre-learned data
size" into our rectified scaling law, which overcomes theoretical limitations
and fits experimental results much better. By leveraging our law, we propose a
novel LLM selection algorithm that selects the near-optimal model with hundreds
of times less resource consumption, while other methods may provide negatively
correlated selection.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要