FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning
arxiv(2024)
摘要
Instruction tuning is an important step in making language models useful for
direct user interaction. However, many legal tasks remain out of reach for most
open LLMs and there do not yet exist any large scale instruction datasets for
the domain. This critically limits research in this application area. In this
work, we curate LawInstruct, a large legal instruction dataset, covering 17
jurisdictions, 24 languages and a total of 12M examples. We present evidence
that domain-specific pretraining and instruction tuning improve performance on
LegalBench, including improving Flan-T5 XL by 8 points or 16% over the
baseline. However, the effect does not generalize across all tasks, training
regimes, model sizes, and other factors. LawInstruct is a resource for
accelerating the development of models with stronger information processing and
decision making capabilities in the legal domain.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要