Sparse Lexical Representation For Semantic Entity Resolution

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2013)

引用 2|浏览35
暂无评分
摘要
This paper addresses the problem of semantic entity resolution (SER), which aims to determine whether some or none of the entities in a knowledge base is mentioned in a given web document. The lexical features, e.g., words and phrases, which are critical to the resolution of the semantic entities are typically of a small amount compared to all lexical features in the web document, and therefore can be modeled as sparse signals. Two techniques leveraging the principles of sparse signal recovery are proposed to identify the sparse, salient lexical features: one technique, based on the Lasso algorithm with the 2-norm distance metric, attempts to recover all the salient lexical features at once; the other technique, namely Posterior Probability Pursuit (PPP), sequentially identifies salient features one after one using the negative log posterior probability as the distance metric. Using a knowledge base consisting of about 100 million entities, we show that the proposed techniques exploiting the sparsity nature underlying SER deliver substantial performance improvement over baseline methods without sparsity consideration, demonstrating the potentials of sparse signal techniques in entity-centric web information processing.
更多
查看译文
关键词
Terms Sparse signal recovery, semantic entity resolution, posterior probability pursuit, Lasso
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要