Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models
CoRR(2024)
摘要
Large Language Models (LLMs) are deployed as powerful tools for several
natural language processing (NLP) applications. Recent works show that modern
LLMs can generate self-explanations (SEs), which elicit their intermediate
reasoning steps for explaining their behavior. Self-explanations have seen
widespread adoption owing to their conversational and plausible nature.
However, there is little to no understanding of their faithfulness. In this
work, we discuss the dichotomy between faithfulness and plausibility in SEs
generated by LLMs. We argue that while LLMs are adept at generating plausible
explanations – seemingly logical and coherent to human users – these
explanations do not necessarily align with the reasoning processes of the LLMs,
raising concerns about their faithfulness. We highlight that the current trend
towards increasing the plausibility of explanations, primarily driven by the
demand for user-friendly interfaces, may come at the cost of diminishing their
faithfulness. We assert that the faithfulness of explanations is critical in
LLMs employed for high-stakes decision-making. Moreover, we urge the community
to identify the faithfulness requirements of real-world applications and ensure
explanations meet those needs. Finally, we propose some directions for future
work, emphasizing the need for novel methodologies and frameworks that can
enhance the faithfulness of self-explanations without compromising their
plausibility, essential for the transparent deployment of LLMs in diverse
high-stakes domains.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要