Impact of Preference Noise on the Alignment Performance of Generative Language Models
arxiv(2024)
摘要
A key requirement in developing Generative Language Models (GLMs) is to have
their values aligned with human values. Preference-based alignment is a widely
used paradigm for this purpose, in which preferences over generation pairs are
first elicited from human annotators or AI systems, and then fed into some
alignment techniques, e.g., Direct Preference Optimization. However, a
substantial percent (20 - 40
are noisy, and it remains unclear how the noise affects the alignment
performance and how to mitigate its negative impact. In this paper, we propose
a framework to inject desirable amounts and types of noise to the preferences,
and systematically study the impact of preference noise on the alignment
performance in two tasks (summarization and dialogue generation). We find that
the alignment performance can be highly sensitive to the noise rates in the
preference data: e.g., a 10 percentage points (pp) increase of the noise rate
can lead to 30 pp drop in the alignment performance (in win rate). To mitigate
the impact of noise, confidence-based data filtering shows significant benefit
when certain types of noise are present. We hope our work can help the
community better understand and mitigate the impact of preference noise in GLM
alignment.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要