Unsupervised Dependency Parsing without Gold Part-of-Speech Tags.
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing(2011)
摘要
We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags --- requiring a word to always have the same part-of-speech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-of-the-art dependency grammar inducer achieves 59.1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus --- 0.7% higher than using gold tags.
更多查看译文
关键词
different context,gold tag,unsupervised word clustering,classic clustering algorithm,dependency grammar induction,different tag,gold part-of-speech tag,grammar induction,state-of-the-art dependency grammar inducer,superior performance,Unsupervised dependency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要