Query Evaluation Over SLP-Represented Document Databases With Complex Document Editing

PROCEEDINGS OF THE 41ST ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS (PODS '22)（2022）

引用 2|浏览11

暂无评分

摘要

It is known that the query result of a regular spanner over a single document D can be enumerated after Omicron (|D|) preprocessing and with constant delay in data complexity (Florenzano et al., ACM TODS 2020, Amarilli et al., ACM TODS 2021). It has been shown (Schmid and Schweikardt, PODS'21) that if the document is represented by a straight-line program (SLP) S, then enumeration is possible with a delay of O( log |D|), but with preprocessing that is linear in |S| (which, in the best case, is logarithmic in |D|). Hence, this compressed setting allows for spanner evaluation in sub-linear time, i. e., with logarithmic upper bounds for preprocessing and delay, if the document is highly-compressible. In this work, we extend these results to the dynamic setting. We consider a document database DDB = {D-1, D-2,..., D-m} that is represented by an SLP S-DDB, and that supports regular spanners M-1, M-2, ..., M-k (meaning that we have data structures at our disposal that allow O(log | D-i |)-delay enumeration of the result of spanner M-j on document D-i). Then we can perform an update by manipulating the existing documents of DDB by a sequence of text-editing operations commonly found in text-editors (like copy and paste, deleting, or copying factors, concatenating documents etc.), and add the thus constructed document to the database. Such an operation is called complex document editing and is given by an expression phi in a suitable algebra. Moreover, after this operation, the document database still supports all the regular spanners M-1, ... , M-k. The total time required for such an update is O(k center dot |phi| center dot log d), where d is the maximum length of any intermediate document constructed in the complex document editing described by phi. We stress the fact that the size |S-DDB| of the SLP (which upper bounds the preprocessing in the static case) is potentially logarithmic in the data, but generally depends on the compressibility of the documents (in the worst case, it is even linear in the data). In contrast to that, we can guarantee that the dependency on the data of our updates is logarithmic regardless of the actual compression achieved by the SLP. In particular, any such update performed by complex document editing adds documents whose length may be exponentially larger than the time needed for performing such an update.

查看译文

关键词

Algorithmics on Compressed Data, Document Spanners, Dynamic Setting, Information Extraction, Straight-Line Programs

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要