Reinforcement Learning with Token-level Feedback for Controllable Text Generation
CoRR(2024)
摘要
To meet the requirements of real-world applications, it is essential to
control generations of large language models (LLMs). Prior research has tried
to introduce reinforcement learning (RL) into controllable text generation
while most existing methods suffer from overfitting issues (finetuning-based
methods) or semantic collapse (post-processing methods). However, current RL
methods are generally guided by coarse-grained (sentence/paragraph-level)
feedback, which may lead to suboptimal performance owing to semantic twists or
progressions within sentences. To tackle that, we propose a novel reinforcement
learning algorithm named TOLE which formulates TOken-LEvel rewards for
controllable text generation, and employs a "first-quantize-then-noise"
paradigm to enhance the robustness of the RL algorithm.Furthermore, TOLE can be
flexibly extended to multiple constraints with little computational expense.
Experimental results show that our algorithm can achieve superior performance
on both single-attribute and multi-attribute control tasks. We have released
our codes at https://github.com/WindyLee0822/CTG
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要