CSEC: A Chinese Semantic Error Correction Dataset for Written Correction

NEURAL INFORMATION PROCESSING, ICONIP 2023, PT V(2024)

引用 0|浏览10
暂无评分
摘要
Existing research primarily focuses on spelling and grammatical errors in English, such as missing or wrongly adding characters. This kind of shallow error has been well-studied. Instead, there are many unsolved deep-level errors in real applications, especially in Chinese, among which semantic errors are one of them. Semantic errors are mainly caused by an inaccurate understanding of the meanings and usage of words. Few studies have investigated these errors. We thus focus on semantic error correction and propose a new dataset, called CSEC, which includes 17,116 sentences and six types of errors. Semantic errors are often found according to the dependency relations of sentences. We thus propose a novel method called Desket (Dependency Syntax Knowledge Enhanced Transformer). Desket solves the CSEC task by (1) capturing the syntax of the sentence, including dependency relations and part-of-speech tagging, and (2) using dependency to guide the generation of the correct output. Experiments on the CSEC dataset demonstrate the superior performance of our model against existing methods.
更多
查看译文
关键词
Writing Correction,Semantic Errors,Error Correction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要