Fangorn: adaptive execution framework for heterogeneous workloads on shared clusters

Yingda Chen,Jiamang Wang,Yifeng Lu,Ying Han,Zhiqiang Lv,Xuebin Min,Hua Cai,Wei Zhang,Haochuan Fan,Chao Li,Tao Guan,Wei Lin,Yangqing Jia,Jingren Zhou

Hosted Content（2021）

引用 2|浏览340

暂无评分

摘要

AbstractPervasive needs for data explorations at all scales have populated modern distributed platforms with workloads of different characteristics. The growing complexities and diversities have thereafter imposed distinct challenges to execute them on shared clusters in corporate or public clouds. This paper presents Fangorn, an adaptive execution framework built on an enriched graph model. As the underlying infrastructure for core computation platforms at Alibaba, Fangorn supports various execution modes and caters to heterogeneous workloads. With the capability to orchestrate graph executions with both long-running and requested-on-demand resources at the same time, Fangorn allows exploration of tradeoffs between latency and resource efficiency, for jobs of all scales. By modeling distributed job executions as mutable graphs with pluggable components, Fangorn offers a systematic framework to adjust job executions adaptively, according to data statistics collected during run-time. Fangorn supports an array of different computation engines ranging from relational to deep learning, and is fully deployed on production clusters across Alibaba. It manages tens of millions of distributed jobs daily, with job size scaling from one to half-million.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要