EulerFD: An Efficient Double-Cycle Approximation of Functional Dependencies

ICDE(2023)

引用 0|浏览75
暂无评分
摘要
Functional dependencies (FDs) have been extensively employed in discovering inferential relationships in databases, which provide feasible approaches for many data mining tasks, such as data obfuscation, query optimization, and schema normalization. Since the explosive growth of data leads to a rapid increase of FDs on large datasets, existing algorithms that pay more attention to the exact FD discovery cannot extract FDs efficiently. To bridge this gap, we propose an Efficient double-cycle approximation of Functional Dependency (EulerFD) discovery algorithm, which ensures both efficiency and accuracy of FD discovery. EulerFD induces FDs from invalid ones as invalidating an FD only requires comparing and verifying some pairs of tuples (that violate the dependency) while validating an FD requires examining and verifying all tuples. Considering the abundant tuple pairs in large datasets, a novel sampling strategy is employed in EulerFD to quickly extract invalid FDs by revising the sampling range according to previous sampling results. Furthermore, EulerFD evaluates the stopping criteria in a double-cycle structure as feedback for further sampling. The sampling strategy and the double-cycle structure complement each other to achieve a more efficient sampling effect. Experimental results on real-world and synthetic datasets, especially the massive datasets from DMS of Alibaba Cloud, justify the design and verify the efficiency and effectiveness of the proposed EulerFD.
更多
查看译文
关键词
data mining,databases,double-cycle structure,efficient double-cycle approximation of functional dependencies,EulerFD,FD discovery algorithm,functional dependency discovery algorithm,inferential relationships,sampling strategy,tuples
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要