Empowering Large Language Models for Textual Data Augmentation
CoRR(2024)
摘要
With the capabilities of understanding and executing natural language
instructions, Large language models (LLMs) can potentially act as a powerful
tool for textual data augmentation. However, the quality of augmented data
depends heavily on the augmentation instructions provided, and the
effectiveness can fluctuate across different downstream tasks. While manually
crafting and selecting instructions can offer some improvement, this approach
faces scalability and consistency issues in practice due to the diversity of
downstream tasks. In this work, we address these limitations by proposing a new
solution, which can automatically generate a large pool of augmentation
instructions and select the most suitable task-informed instructions, thereby
empowering LLMs to create high-quality augmented data for different downstream
tasks. Empirically, the proposed approach consistently generates augmented data
with better quality compared to non-LLM and LLM-based data augmentation
methods, leading to the best performance on 26 few-shot learning tasks sourced
from a wide range of application domains.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要