Error Checking for Sparse Systolic Tensor Arrays
CoRR(2024)
摘要
Structured sparsity is an efficient way to prune the complexity of modern
Machine Learning (ML) applications and to simplify the handling of sparse data
in hardware. In such cases, the acceleration of structured-sparse ML models is
handled by sparse systolic tensor arrays. The increasing prevalence of ML in
safety-critical systems requires enhancing the sparse tensor arrays with online
error detection for managing random hardware failures. Algorithm-based fault
tolerance has been proposed as a low-cost mechanism to check online the result
of computations against random hardware failures. In this work, we address a
key architectural challenge with structured-sparse tensor arrays: how to
provide online error checking for a range of structured sparsity levels while
maintaining high utilization of the hardware. Experimental results highlight
the minimum hardware overhead incurred by the proposed checking logic and its
error detection properties after injecting random hardware faults on sparse
tensor arrays that execute layers of ResNet50 CNN.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要