Accelerating MPI Collectives with Process-in-Process-based Multi-object Techniques

Jiajun Huang,Kaiming Ouyang,Yujia Zhai,Jinyang Liu,Min Si,Ken Raffenetti,Hui Zhou,Atsushi Hori,Zizhong Chen,Yanfei Guo,Rajeev Thakur

PROCEEDINGS OF THE 32ND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2023（2023）

引用 0|浏览75

暂无评分

摘要

In the exascale computing era, optimizing MPI collective performance in high-performance computing (HPC) applications is critical. Current algorithms face performance degradation due to system call overhead, page faults, or data-copy latency, affecting HPC applications' efficiency and scalability. To address these issues, we propose PiP-MColl, a Process-in-Process-based Multi-object Interprocess MPI Collective design that maximizes small message MPI collective performance at scale. PiP-MColl features efficient multiple sender and receiver collective algorithms and leverages Process-in-Process shared memory techniques to eliminate unnecessary system call, page fault overhead, and extra data copy, improving intra- and inter-node message rate and throughput. Our design also boosts performance for larger messages, resulting in comprehensive improvement for various message sizes. Experimental results show that PiP-MColl outperforms popular MPI libraries, including OpenMPI, MVAPICH2, and Intel MPI, by up to 4.6X for MPI collectives like MPI_Scatter and MPI_Allgather.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要