HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures.

International Symposium on High-Performance Computer Architecture(2024)

引用 0|浏览2
暂无评分
摘要
Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application across a wide range of domains, including machine learning and linear algebra solvers. In many sparse matrices, the pattern of nonzeros is nonuniform: nonzeros form dense and sparse regions, rather than being uniformly distributed across the whole matrix. We refer to this property as Intra-Matrix Heterogeneity (IMH). Currently, SpMM accelerator designs do not leverage this heterogeneity. They employ the same processing elements (PEs) for all the regions of a sparse matrix, resulting in suboptimal acceleration. To address this limitation, we utilize heterogeneous SpMM accelerator architectures, which include different types of PEs to exploit IMH. We develop an analytical modeling framework to predict the performance of different types of accelerator PEs taking into account IMH. Furthermore, we present a heuristic for partitioning sparse matrices among heterogeneous PEs. We call our matrix modeling and partitioning method HotTiles. To evaluate HotTiles, we simulate three different heterogeneous architectures. Each one consists of two types of workers (i.e., PEs): one suited for compute-bound denser regions (Hot Worker) and one for memory-bound sparser regions (Cold Worker). Our results show that exploiting IMH with HotTiles is very effective. Depending on the architecture, heterogeneous execution with HotTiles outperforms homogeneous execution using only hot or only cold workers by 9.2-16.8x and 1.4-3.7x, respectively. In addition, HotTiles outperforms the best worker type used on a per-matrix basis by 1.3-2.5 x. Finally, HotTiles outperforms an IMH-unaware heterogeneous execution strategy by 1.4-2.2x.
更多
查看译文
关键词
hardware accelerators,heterogeneous computing,sparse computations,SpMM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要