Return of EM: Entity-driven Answer Set Expansion for QA Evaluation
arxiv(2024)
摘要
Recently, directly using large language models (LLMs) has been shown to be
the most reliable method to evaluate QA models. However, it suffers from
limited interpretability, high cost, and environmental harm. To address these,
we propose to use soft EM with entity-driven answer set expansion. Our approach
expands the gold answer set to include diverse surface forms, based on the
observation that the surface forms often follow particular patterns depending
on the entity type. The experimental results show that our method outperforms
traditional evaluation methods by a large margin. Moreover, the reliability of
our evaluation method is comparable to that of LLM-based ones, while offering
the benefits of high interpretability and reduced environmental harm.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要