Predictable and Consistent Information Extraction.

DocEng(2019)

引用 0|浏览14
暂无评分
摘要
Information extraction programs (extractors) can be applied to documents to isolate structured versions of some content, that is, to create tabular records corresponding to facts found in the documents. If the data in an extracted table needs to be updated for any reason (for example, as a result of data cleaning), the source document will no longer be synchronized with the data. But documents are the principal medium for sharing information among humans. We therefore wish to ensure that changes to extracted tables are reflected correctly in their source documents. In this work, we characterize extractors for which we are able to predict the effects that updates to source documents will have on extracted records. We introduce three general properties for extractors that, if satisfied, can guarantee that consistency will be maintained if the lineage of extracted records is respected when changing the documents. We propose a property verification process that uses static analysis for a substantial subset of JAPE, a well-established rule-based extraction language, and illustrate it through an example based on a freely-available extractor library.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要