Evaluating the Performance of One-sided Communication on CPUs and GPUs.

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis(2023)

引用 0|浏览9
暂无评分
摘要
Effective programming models offer programmers the ability to harness the capabilities of the underlying platform. For decades, the two-sided Message Passing Interface (MPI) has become a de facto standard for communication among processes running on distributed memory systems. As high-performance GPU computing becomes the trend, GPU-initiated one-sided communication becomes a viable solution for multi-GPU scaling. It also highlights the use of one-sided communication on CPUs. However, the lack of deep understanding of one-sided communication performance and its impact on an application’s performance becomes a hurdle. In this paper, we overcome this hurdle by proposing a Message Roofline model, which characterizes an application’s sustained messaging performance (GB/s) as a function of its message size, number of messages per synchronization, peak network bandwidth, and network latency. We use three benchmarks to demonstrate the potentials of one-sided communication on CPUs and GPUs. These benchmarks include Stencils representing applications that follow the bulk synchronization programming model, Sparse Triangular Solve representing directed acyclic graph computations that perform asynchronous point-to-point communications, and Distributed HashTable performing atomic compare and swaps. Our evaluation provides insights into practically understanding the two-sided and one-sided communications in MPI applications, and can also guide hardware vendors with design principles lest the potential performance of one-sided communications being under-utilized.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要