HSS: A Hierarchical Semantic Similarity Hard Negative Sampling Method for Dense Retrievers.

MMM (2)(2023)

引用 0|浏览37
暂无评分
摘要
Dense Retriever (DR) for Open-domain textual question answering (OpenQA), which aims to retrieve passages from large data sources like Wikipedia or Google, has gained wide attention in recent years. Although DR models continuously refresh state-of-the-art performances, their improvement relies on negative sampling during the training process. Existing sampling strategies mainly focus on developing a complex algorithm based on computer science, and ignore the abundant semantic features of datasets. We discover that there exists obvious changes in semantic similarity and present a three-level hierarchy of semantic similarity: same topic, same class, other class, whose rationality is further demonstrated by ablation study. Based on this, we propose a hard negative sampling strategy named Hierarchical Semantic Similarity (HSS). Our HSS model performs negative sampling at semantic levels of topic and class, and experimental results on four datasets show that it achieves comparable or better retrieval performance compared with existing competitive baselines. The code is available in https://github.com/redirecttttt/HSS.
更多
查看译文
关键词
Negative sampling, Dense Retriever, Semantic Similarity, Open-domain question answering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要