UMass at TREC 2004: Novelty and HARD

Nasreen Abdul Jaleel,James Allan,W. Bruce Croft,Fernando Diaz,Leah S. Larkey,Xiaoyan Li,Mark D. Smucker,Courtney Wade

TREC（2004）

引用 284|浏览46

暂无评分

摘要

For the TREC 2004 Novelty track, UMass participated in all four tasks. Although finding relevant sentences was harder this year than last, we continue to show marked improvements over the baseline of calling all sentences relevant, with a variant of tfidf being the most successful approach. We achieve 5-9% improvements over the base- line in locating novel sentences, primarily by looking at the similarity of a sentence to earlier sentences and focus- ing on named entities. For the High Accuracy Retrieval from Documents (HARD) track, we investigated the use of clarification forms, fixed- and variable-length passage retrieval, and the use of metadata. Clarification form results indicate that passage level feedback can provide improvements comparable to user supplied related-text for document evaluation and outperforms related-text for passage eval- uation. Document retrieval methods without a query ex- pansion component show the most gains from related-text. We also found that displaying the top passages for feed- back outperformed displaying centroid passages. Named entity feedback resulted in mixed performance. Our pri- mary findings for passage retrieval are that document re- trieval methods performed better than passage retrieval methods on the passage evaluation metric of binary pref- erence at 12,000 characters, and that clarification forms improved passage retrieval for every retrieval method ex- plored. We found no benefit to using variable-length pas- sages over fixed-length passages for this corpus. Our use of geography and genre metadata resulted in no significant changes in retrieval performance.

查看译文

关键词

metadata,accuracy,statistical analysis,feedback,document retrieval,information retrieval

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要