Hybrid Word/Part-Of-Arabic-Word Language Models For Arabic Text Document Recognition

2015 13th International Conference on Document Analysis and Recognition (ICDAR)(2015)

引用 8|浏览36
暂无评分
摘要
This paper describes a simple approach to generate an efficient hybrid word/Part-of-Arabic-Word (PAW) Language Model (LM). More precisely, less frequent words in a full word vocabulary are decomposed into PAWs. The resulted PAWs are incorporated with the most frequent words to generate a hybrid word-PAW vocabulary which is used to estimate a hybrid flat n-gram statistical language model. For comparison purposes, language models with full PAW decomposition of the word vocabulary are generated. To assess the quality of the three types of LMs (i.e. full word, hybrid word/PAW and full PAW LMs), evaluation experiments are conducted under three different tasks using two benchmarking databases, namely Maurdor and Khatt. Results in terms of word error rate show that systems using the full PAW and the proposed hybrid LMs perform equally the same, and both of them, systematically, outperform systems using word LMs. However, systems using hybrid LMs require less memory than those using full PAW LMs.
更多
查看译文
关键词
Arabic text document recognition,hybrid part-of-Arabic-word language model,hybrid PAW language model,hybrid PAW LM,word vocabulary,hybrid word-PAW vocabulary,n-gram statistical language model,benchmarking database
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要