GEqO: ML-Accelerated Semantic Equivalence Detection
Proceedings of the ACM on Management of Data(2024)
摘要
Large scale analytics engines have become a core dependency for modern
data-driven enterprises to derive business insights and drive actions. These
engines support a large number of analytic jobs processing huge volumes of data
on a daily basis, and workloads are often inundated with overlapping
computations across multiple jobs. Reusing common computation is crucial for
efficient cluster resource utilization and reducing job execution time.
Detecting common computation is the first and key step for reducing this
computational redundancy. However, detecting equivalence on large-scale
analytics engines requires efficient and scalable solutions that are fully
automated. In addition, to maximize computation reuse, equivalence needs to be
detected at the semantic level instead of just the syntactic level (i.e., the
ability to detect semantic equivalence of seemingly different-looking queries).
Unfortunately, existing solutions fall short of satisfying these requirements.
In this paper, we take a major step towards filling this gap by proposing
GEqO, a portable and lightweight machine-learning-based framework for
efficiently identifying semantically equivalent computations at scale. GEqO
introduces two machine-learning-based filters that quickly prune out
nonequivalent subexpressions and employs a semi-supervised learning feedback
loop to iteratively improve its model with an intelligent sampling mechanism.
Further, with its novel database-agnostic featurization method, GEqO can
transfer the learning from one workload and database to another. Our extensive
empirical evaluation shows that, on TPC-DS-like queries, GEqO yields
significant performance gains-up to 200x faster than automated verifiers-and
finds up to 2x more equivalences than optimizer and signature-based equivalence
detection approaches.
更多查看译文
关键词
machine learning,semantic query optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要