Dual-Encoders for Extreme Multi-Label Classification
arxiv(2023)
摘要
Dual-encoder (DE) models are widely used in retrieval tasks, most commonly
studied on open QA benchmarks that are often characterized by multi-class and
limited training data. In contrast, their performance in multi-label and
data-rich retrieval settings like extreme multi-label classification (XMC),
remains under-explored. Current empirical evidence indicates that DE models
fall significantly short on XMC benchmarks, where SOTA methods linearly scale
the number of learnable parameters with the total number of classes (documents
in the corpus) by employing per-class classification head. To this end, we
first study and highlight that existing multi-label contrastive training losses
are not appropriate for training DE models on XMC tasks. We propose decoupled
softmax loss - a simple modification to the InfoNCE loss - that overcomes the
limitations of existing contrastive losses. We further extend our loss design
to a soft top-k operator-based loss which is tailored to optimize top-k
prediction performance. When trained with our proposed loss functions, standard
DE models alone can match or outperform SOTA methods by up to 2
even on the largest XMC datasets while being 20x smaller in terms of the number
of trainable parameters. This leads to more parameter-efficient and universally
applicable solutions for retrieval tasks. Our code and models are publicly
available at https://github.com/nilesh2797/dexml.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要