ULSeq-TA: Ultra-Long Sequence Attention Fusion Transformer Accelerator Supporting Grouped Sparse Softmax and Dual-Path Sparse LayerNorm

Jingyu Wang,Lu Zhang,Xueqing Li,Huazhong Yang,Yongpan Liu

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS（2024）

引用 0|浏览9

暂无评分

摘要

Transformer networks have been increasingly successful in various fields. The input sequence lengths have become much larger as the algorithm and task complexity develops, which is challenging due to high computational and storage cost. Softmax and LayerNorm are bottleneck nonlinear operators in ultra-long sequence Transformer networks. To improve the efficiency of Softmax, assumption-based and quantization-based Softmax approaches are introduced. However, the sparsity potential to accelerate Softmax itself is not fully discovered. To improve the efficiency of LayerNorm, some works reduce the input size, and some works explore the pipeline. However, the sparsity potential is also not yet explored. To address these challenges, this article presents the ULSeq-TA software-hardware co-design framework. The software includes 1) the grouped sparse Softmax method to leverage the data magnifying characteristic to explore the middle and post-Softmax sparse processing and 2) the dual-path sparse LayerNorm method which explores the dimensional significance for sparse calculation. The hardware includes 1) an attention fusion architecture which reduces the on-chip memory with fused operators; 2) the grouped sparse Softmax core; and 3) the dual-path sparse LayerNorm core. Experiments show that the software achieves 4.45x and 7.59x computation reduction with little output difference for Softmax and LayerNorm, respectively. The hardware architecture supports at most 32768 sequence length with only 186-kB on-chip memory and achieves 1.75x-1.98x and 3.22x-4.32x speedups for sparse Softmax core and sparse LayerNorm core with little accuracy loss, respectively.

查看译文

关键词

Transformers,Task analysis,System-on-chip,Decoding,Sparse matrices,Hardware,Transformer cores,Long sequence,software-hardware co-design,sparse LayerNorm,sparse Softmax,transformer accelerator

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要