Accelerating Distributed Inference of Sparse Deep Neural Networks via Mitigating the Straggler Effect

2020 IEEE High Performance Extreme Computing Conference (HPEC)(2020)

引用 4|浏览20
暂无评分
摘要
Once a Deep Neural Network (DNN) is trained, an inference algorithm retains the learning and applies it to batches of data. The trained DNN can be sparse because of pruning or following a preset sparse connectivity pattern. Inference in such sparse networks requires less space and time complexities compared to dense ones. Similar to dense DNNs, sparse DNNs can be parallelized using model or data parallelism, whereby the former partitions the network and the latter partitions the input among multiple threads. Model parallelism efficiently utilizes the Last Level Cache (LLC) but has a heavy synchronization cost because of compulsory reductions per layer. In contrast, data parallelism allows independent execution of partitions but suffers from a straggler effect due to a load imbalance among partitions. We combine data and model parallelisms through a new type of parallelism that we denote data-then-model. In data-then-model, each thread starts with data parallelism, thus mitigating the per-layer synchronization cost of model parallelism. After it finishes its partition, it switches to model parallelism to support a slower active thread, hence, alleviating the straggler effect of data parallelism. We compare data-then-model parallelism with data and model parallelisms as well as task-based parallelisms using the IEEE HPEC sparse DNN challenge dataset. On average, we achieve up to 10 to 65% speedup compared to these parallelisms.
更多
查看译文
关键词
Sparse DNN,Sparse matrix-matrix multiplication,DNN parallelism,data parallelism,model parallelism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要