GGDMiner – Discovery of Graph Generating Dependencies for Graph Data Profiling
CoRR(2024)
摘要
With the increasing use of graph-structured data, there is also increasing
interest in investigating graph data dependencies and their applications, e.g.,
in graph data profiling. Graph Generating Dependencies (GGDs) are a class of
dependencies for property graphs that can express the relation between
different graph patterns and constraints based on their attribute similarities.
Rich syntax and semantics of GGDs make them a good candidate for graph data
profiling. Nonetheless, GGDs are difficult to define manually, especially when
there are no data experts available. In this paper, we propose GGDMiner, a
framework for discovering approximate GGDs from graph data automatically, with
the intention of profiling graph data through GGDs for the user. GGDMiner has
three main steps: (1) pre-processing, (2) candidate generation, and, (3) GGD
extraction. To optimize memory consumption and execution time, GGDMiner uses a
factorized representation of each discovered graph pattern, called Answer
Graph. Our results show that the discovered set of GGDs can give an overview
about the input graph, both schema level information and also correlations
between the graph patterns and attributes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要