Extracting and exploiting word relationships for information retrieval

Extracting and exploiting word relationships for information retrieval(2009)

引用 23|浏览93
暂无评分
摘要
With the exponential growth of information in the Web, information retrieval systems have become more and more important as an indispensable tool to locate the information that interests the users. The traditional information retrieval systems adopt the independence assumption in order to simplify the model construction. The independence lies in three aspects, i.e., among query terms, document terms, or between a query terra and a document term. The independence assumption does not hold in practice, which results in the ambiguities in query and documents representation, as well as the "exact match" for relevant document retrieval. In this thesis, we try to release the independence assumption by exploiting the relationships between words. Since we adopt the language modeling framework for document ranking, we have to estimate a probabilistic model with multinomial distribution for document and query respectively. Therefore, our basic approach is to improve the estimation of the two models by making use of the word relationships. In the thesis, we tried the following approaches: (1) Document expansion, which aims to avoid the "exact-match" for relevant documents. We consider the word relationships when smoothing the document model so some related terms will be assign higher probabilities even they do not occur in the document. (2) Query expansion to resolve the ambiguity in query representation. Particularly, we use a Markov Chain model to exploit non-immediate word relationships. This framework is also extended to deal with cross-lingual information retrieval problems. (3) We also proposed a supervised learning framework to select good query expansion terms and query alterations. The selection is according to the relationship between the selected term and other query terms. More particularly, the selection considers the impact of individual expansion terms. All the proposed methods are evaluated with TREC or NTCIR benchmarks, and the experimental results show the methods achieve substantial improvements over some competitive baselines. Keywords: Information Retrieval, Language Modeling, Word Relationship, Document Expansion, Query Expansion
更多
查看译文
关键词
document expansion,information retrieval,query alteration,query terra,good query expansion term,document term,query term,word relationship,query representation,independence assumption,query expansion,language modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要