Bubble execution: resource-aware reliable analytics at cloud scale

Hosted Content(2018)

引用 12|浏览102
暂无评分
摘要
AbstractEnabling interactive data exploration at cloud scale requires minimizing end-to-end query execution latency, while guaranteeing fault tolerance, and query execution under resource-constraints. Typically, such a query execution involves orchestrating the execution of hundreds or thousands of related tasks on cloud scale clusters. Without any resource constraints, all query tasks can be scheduled to execute simultaneously (gang scheduling) while connected tasks stream data between them. When the data size referenced by a query increases, gang scheduling may be resource-wasteful or un-satisfiable with a limited, per-query resource budget. This paper introduces Bubble Execution, a new query processing framework for interactive workloads at cloud scale, that balances cost-based query optimization, fault tolerance, optimal resource management, and execution orchestration. Bubble execution involves dividing a query execution graph into a collection of query sub-graphs (bubbles), and scheduling them within a per-query resource budget. The query operators (tasks) inside a bubble stream data between them while fault tolerance is handled by persisting temporary results at bubble boundaries. Our implementation enhances our JetScope service, for interactive workloads, deployed in production clusters at Microsoft. Experiments with TPC-H queries show that bubble execution can reduce resource usage significantly in the presence of failures while maintaining performance competitive with gang execution.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要