14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC.

IEEE International Solid-State Circuits Conference(2024)

引用 0|浏览12
暂无评分
摘要
The rapid emergence of AI models, specifically large language models (LLMs) requiring large amounts of compute, drives the need for dedicated AI inference hardware. During deployment, compute utilization (and thus power consumption) can vary significantly across layers of an AI model, number of tokens, precision, and batch size [1]. Such wide variation, which may occur at fast time scales, poses unique challenges in optimizing performance within the system-level specifications for discrete accelerator cards, including not just average power consumption, but also peak instantaneous current draw, which may require consideration of time constants down to μs-scale [2]. Prior current-limiting systems [2], [3], which use reactive schemes and often target general-purpose processors, may not be sufficient for AI workloads. This work leverages the unique characteristic of AI workloads, which allows predictive compile-time software optimization and proposes a new power management architecture to minimize worst-case margins and realize the potential of AI accelerators. In addition, due to wide variation of power consumption across card components in AI workloads, sensing the card-level (vs. chip-level) current provides more opportunity for optimization. A new software-assisted feed-forward current-limiting scheme is thus proposed in conjunction with PCIe-card-level closed-loop control to maximize performance under sub-ms peak current constraints.
更多
查看译文
关键词
Peak Current,Batch Size,Window Size,Power Consumption,State Control,Voltage Regulation,Closed-loop Control,Power Management,Fast Timescale,Optimal Opportunity,Average Power Consumption,P2 Region,Transformer Block,AI Models,High Duty Cycle,Feedforward Strategy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要