Fusing In-storage and Near-storage Acceleration of Convolutional Neural Networks

ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS(2024)

引用 0|浏览2
暂无评分
摘要
Video analytics has a wide range of applications and has attracted much interest over the years. While it can be both computationally and energy-intensive, video analytics can greatly benefit from in/near memory compute. The practice of moving compute closer to memory has continued to show improvements to performance and energy consumption and is seeing increasing adoption. Recent advancements in solid state drives (SSDs) have incorporated near memory Field Programmable Gate Arrays (FPGAs) with shared access to the drive's storage cells. These near memory FPGAs are capable of running operations required by video analytic pipelines such as object detection and template matching. These operations are typically executed using Convolutional Neural Networks (CNNs). A CNN is composed of multiple individually processed layers that perform various image processing tasks. Due to lack of resources, a layer may be partitioned into more manageable sub-layers. These sub-layers are then processed sequentially, however, some sub-layers can be processed simultaneously. Moreover, the storage cells within FPGA equipped SSDs are capable of being augmented with in-storage compute to accelerate CNN workloads and exploit the intra-parallelism within a CNN layer. To this end, we present our work, which leverages heterogeneous architectures to create an in/near-storage acceleration solution for video analytics. We designed a NAND flash accelerator and an FPGA accelerator, then mapped and evaluated several CNN benchmarks. We show how to utilize FPGAs, local DRAMs, and in-memory SSD compute to accelerate CNN workloads. Our work also demonstrates how to remove unnecessary memory transfers to save latency and energy.
更多
查看译文
关键词
Object detection,Neural Networks,PIM,FPGA,near-memory compute
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要