A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs.

IPDPS(2023)

引用 0|浏览27
暂无评分
摘要
Smart Network Interface Cards (SmartNICs) such as NVIDIA's BlueField Data Processing Units (DPUs) provide advanced networking capabilities and processor cores, enabling the offload of complex operations away from the host. In the context of MPI, prior work has explored the use of DPUs to offload non-blocking collective operations. The limitations of current state-of-the-art approaches are twofold: They only work for a pre-defined set of algorithms/communication patterns and have degraded communication latency due to staging data between the DPU and the host. In this paper, we propose a framework that supports the offload of any communication pattern to the DPU while achieving low communication latency with perfect overlap. To achieve this, we first study the limitations of higher-level programming models such as MPI in expressing the offload of complex communication patterns to the DPU. We present a new set of APIs to alleviate these shortcomings and support any generic communication pattern. Then, we analyze the bottlenecks involved in offloading communication operations to the DPU and propose efficient designs for a few candidate communication patterns. To the best of our knowledge, this is the first framework providing both efficient and generic communication offload to the DPU. Our proposed framework outperforms state-of-the-art staging-based offload solutions by 47% in Alltoall micro-benchmarks, and at the application level, we see improvements up to 60% in P3DFFT and 15% in HPL on 512 processes.
更多
查看译文
关键词
HPC, Infiniband, MPI, SmartNIC, DPU, Offload, GVMI, Cross-GVMI
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要