Inference Time Reduction of Deep Neural Networks on Embedded Devices: A Case Study

2022 25th Euromicro Conference on Digital System Design (DSD)(2022)

引用 1|浏览15
暂无评分
摘要
From object detection to semantic segmentation, deep learning has achieved many groundbreaking results in recent years. However, due to the increasing complexity, the execution of neural networks on embedded platforms is greatly hindered. This has motivated the development of several neural network minimisation techniques, amongst which pruning has gained a lot of focus. In this work, we perform a case study on a series of methods with the goal of finding a small model that could run fast on embedded devices. First, we suggest a simple, but effective, ranking criterion for filter pruning called Mean Weight. Then, we combine this new criterion with a threshold-aware layer-sensitive filter pruning method, called T-sensitive pruning, to gain high accuracy. Further, the pruning algorithm follows a structured filter pruning approach that removes all selected filters and their dependencies from the DNN model, leading to less computations, and thus low inference time in lower-end CPUs. To validate the effectiveness of the proposed method, we perform experiments on three different datasets (with 3, 101, and 1000 classes) and two different deep neural networks (i.e., SICK-Net and MobileNet V1). We have obtained speedups of up to 13x on lower-end CPUs (Armv8) with less than 1% drop in accuracy. This satisfies the goal of transferring deep neural networks to embedded hardware while attaining a good trade-off between inference time and accuracy.
更多
查看译文
关键词
Deep neural network optimisation,embedded deep learning,edge AI,pruning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要