A 28nm 1.07TFLOPS/mm² Dynamic-Precision Training Processor with Online Dynamic Execution and Multi- Level-Aligned Block-FP Processing

Yixiong Yang,Ruoyang Liu,Chenhan Wei,Wenxun Wang,Wenyu Sun,Jinshan Yue,Huazhong Yang,Yongpan Liu

2023 IEEE Custom Integrated Circuits Conference (CICC)（2023）

引用 0|浏览38

暂无评分

摘要

Training deep learning (DL) models consumes a huge amount of time and energy in cloud servers and edge devices, requiring energy- efficient processors [1 –5] to meet the rapid-growing demand for AI. Training processors either utilize a high-precision floating-point (FP) format to provide robust training results, or a low-precision format to increase efficiency but fail in accuracy. Mixed precision training (MPT) is promising to achieve both high accuracy and high efficiency. Manual mixed precision [5] is usually a coarse-grained mapping (per layer), which limits training accuracy. Automatic precision search [6] provides accurate and fine-grained precision mapping, but the high search latency slowdown the overall training process.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要

A 28nm 1.07TFLOPS/mm2 Dynamic-Precision Training Processor with Online Dynamic Execution and Multi- Level-Aligned Block-FP Processing

A 28nm 1.07TFLOPS/mm² Dynamic-Precision Training Processor with Online Dynamic Execution and Multi- Level-Aligned Block-FP Processing