Revisiting Performance Evaluation in the Age of Uncertainty.

International Conference on High Performance Computing, Data, and Analytics(2023)

引用 0|浏览4
暂无评分
摘要
Given a cloud-native application, how do we accurately estimate its performance, such as run time or memory consumption? Accurate estimation is necessary to ensure that the application meets performance goals without resorting to overprovisioning of resources. Additionally, in practice, performance esti-mation needs to be meaningful and reproducible. Un-fortunately, modern HPC systems come with numerous factors affecting performance estimation, such as het-erogeneous accelerators, multilevel networks, millions of cores, layered software abstractions, and specialized middleware. Each of these factors adds a degree of variability to empirical performance results. The approaches currently being taught and practiced limit performance evaluation in three ways: (1) usage of incomplete performance descriptions/metrics such as point summaries (e.g., mean, 99th-percentile or median) which hide the rich behavioral patterns in dif-ferent scenarios; (2) measuring insufficient performance samples, leading to inaccurate performance description; and (3) measuring excessive performance samples, leading to waste of precious computing resources. To overcome these limitations, we propose a new approach to evaluate and reason about application performance in modern HPC in a meaningful way. Our contribution is threefold: (a) we show the difficulty of estimating performance in realistic scenarios: one per-formance measurement is not enough; (b) we propose to use distributions as the true measure of performance; and (c) we propose several practices and concepts to be taught to HPC students and practitioners, so that they may produce rich and accurate performance evaluations. We see our work having an impact both on educators and on practitioners.
更多
查看译文
关键词
high-performance computing (HPC),performance evaluation,computer-science education,benchmarking,curriculum design
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要