Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow

2019 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)(2019)

引用 17|浏览103
暂无评分
摘要
Accomplishing the goal of exascale computing under a potential power limit requires HPC clusters to maximize both parallel efficiency and power efficiency. As modern HPC systems embark on a trend toward extreme heterogeneity leveraging multiple GPUs per node, power management becomes even more challenging, especially when catering to scientific workflows with co-scheduled components. The impact of managing GPU power on workflow performance and run-to-run reproducibility has not been adequately studied. In this paper, we present a first-of-its-kind research to study the impact of the two power management knobs that are available on NVIDIA Volta GPUs: frequency capping and power capping. We analyzed performance and power metrics of GPU's on a top-10 supercomputer by tuning these knobs for more than 5,300 runs in a scientific workflow. Our data found that GPU power capping in a scientific workflow is an effective way of improving power efficiency while preserving performance, while GPU frequency capping is a demonstrably unpredictable way of reducing power consumption. Additionally, we identified that frequency capping results in higher variation and anomalous behavior on GPUs, which is counterintuitive to what has been observed in the research conducted on CPUs.
更多
查看译文
关键词
Workflows, Cancer MuMMI, GPU power capping, GPU frequency capping, Performance, Variation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要