AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs
CoRR(2024)
摘要
Pre-trained Large Language Models (LLMs) have significantly advanced natural
language processing capabilities but are susceptible to biases present in their
training data, leading to unfair outcomes in various applications. While
numerous strategies have been proposed to mitigate bias, they often require
extensive computational resources and may compromise model performance. In this
work, we introduce AXOLOTL, a novel post-processing framework, which operates
agnostically across tasks and models, leveraging public APIs to interact with
LLMs without direct access to internal parameters. Through a three-step process
resembling zero-shot learning, AXOLOTL identifies biases, proposes resolutions,
and guides the model to self-debias its outputs. This approach minimizes
computational costs and preserves model performance, making AXOLOTL a promising
tool for debiasing LLM outputs with broad applicability and ease of use.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要