PoDD : power-capping dependent distributed applications

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(2019)

引用 20|浏览25
暂无评分
摘要
Power budgeting (or capping) has become essential for large-scale computing installations. Meanwhile, as these systems scale out, they can concurrently execute dependent applications that were previously processed serially. Such application coupling reduces IO traffic and overall time to completion as the applications now communicate at runtime instead of through disk. Coupled applications are predicted to be a major workload for future exascale supercomputers; e.g., scientific simulations will execute concurrently with in situ analysis. One critical challenge for power budgeting systems is implementing power capping for coupled applications while still achieving high performance. Existing approaches on power capping coupled workloads, however, have major limitations including: (1) poor practicality, due to dependence on offline application profiling; and (2) limited optimization opportunity, as they consider power reallocation on a strictly global level (from node-to-node), without considering node-level optimization opportunities. To overcome these limitations, we propose PoDD, a hierarchical, distributed power management system for coupled applications. PoDD uses classifiers and online model building to determine optimal power and performance tradeoffs without offline profiling or application instrumentation. We implement it on a 49-node cluster and compare it to SLURM, a state-of-the-art job scheduler that considers power, but not coupling, and PowerShift, a power capping system for coupled applications without node-level optimization. PoDD improves mean performance over SLURM by 14--22% and over PowerShift by 11--13%. Finally, PoDD is resilient to tail behavior and system noise, improving performance in noisy environments by 44% on average compared to even power distribution.
更多
查看译文
关键词
adaptive systems, machine learning, power management
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要