Design Trade-offs for a Robust Dynamic Hybrid Hash Join.

Proceedings of the VLDB Endowment(2022)

引用 2|浏览36
暂无评分
摘要
Hybrid Hash Join (HHJ) has proven to be one of the most efficient and widely-used join algorithms. While HHJ's performance depends largely on accurate statistics and information about the input relations, it may not always be practical or possible for a system to have such information available. HHJ's design depends on many details to perform well. This paper is an experimental and analytical study of the trade-offs in designing a robust and dynamic HHJ operator. We revisit the design and optimization techniques suggested by previous studies through extensive experiments, comparing them with other algorithms designed by us or used in related studies. We explore the impact of the number of partitions on HHJ's performance and propose a new lower bound for the number of partitions. We design and evaluate different partition insertion techniques to maximize memory utilization with the least CPU cost. Additionally, we consider a comprehensive set of algorithms for dynamically selecting a partition to spill and compare the results against previously published studies. We then present and evaluate two alternative growth policies for spilled partitions. These algorithms have been implemented in the context of Apache AsterixDB and evaluated under different scenarios such as variable record sizes, different distributions of join attributes, and different storage types, including HDD, SSD, and Amazon Elastic Block Store (Amazon EBS).
更多
查看译文
关键词
trade-offs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要