Host Software Stack Optimizations to Maximize Aggregate Fabric Throughput

Vignesh T. Ravi,James Erwin,Pradeep Sivakumar, C. Q. Tang, Jianxin Xiong,Ravindra Babu Ganapathi,Mark Debbage

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI)（2017）

引用 1|浏览45

暂无评分

摘要

Scientific HPC applications along with the emerging class of Big Data and Machine Learning workloads are rapidly driving the fabric scale both on premises and in the cloud. Achieving high aggregate fabric throughput is paramount to the overall performance of the application. However, achieving high fabric throughput at scale can be challenging - that is, the application communication pattern will need to map well on to the target fabric architecture, and the multi-layered host software stack in the middle will need to orchestrate that mapping optimally to unleash the full performance.In this paper, we investigate low-level optimizations to the host software stack with the goal of improving the aggregate fabric throughput, and hence, application performance. We develop and present a number of optimization and tuning techniques that are key driving factors to the fabric performance at scale - such as, Fine-grained interleaving, improved pipelining, and careful resource utilization and management. We believe that these low-level optimizations can be commonly leveraged by several programming models and their runtime implementations making these optimizations broadly applicable. Using a set of well-known MPI-based scientific applications, we demonstrate that these optimizations can significantly improve the overall fabric throughput and the application performance. Interestingly, we also observe that some of these optimizations are inter-related and can additively contribute to the overall performance.

查看译文

关键词

host software stack optimizations,scientific HPC applications,fabric scale,high aggregate fabric throughput,application communication pattern,target fabric architecture,low-level optimizations,application performance,tuning techniques,fabric performance,Fine-grained interleaving pipelining,improved pipelining,scientific applications,Big Data workload,Machine Learning workload,resource utilization-and-management,programming models,MPI-based scientific applications

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要