14.1 A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoC.

Monodeep Kar,Joel Silberman,Swagath Venkataramani,Viji Srinivasan,Bruce M. Fleischer, Joshua Rubin, JohnDavid Lancaster, Sae Kyu Lee,Matthew Cohen,Matthew M. Ziegler,Nianzheng Cao, Sandra Woodward,Ankur Agrawal,Ching Zhou,Prasanth Chatarasi, Thomas Gooding,Michael Guillorn, Bahman Hekmatshoartabari, Philip Jacob,Radhika Jain,Shubham Jain,Jinwook Jung,Kyu-Hyoun Kim, Siyu Koswatta,Martin Lutz, Alberto Mannari, Abey Mathew, Indira Nair,Ashish Ranjan,Zhibin Ren,Scot Rider, Thomas Roewer, David L. Satterfield,Marcel Schaal,Sanchari Sen,Gustavo Tellez,Hung Tran,Wei Wang,Vidhi Zalani,Jintao Zhang,Xin Zhang,Vinay Shah,Robert M. Senger,Arvind Kumar,Pong-Fei Lu,Leland Chang

IEEE International Solid-State Circuits Conference（2024）

引用 0|浏览12

暂无评分

摘要

The rapid emergence of AI models, specifically large language models (LLMs) requiring large amounts of compute, drives the need for dedicated AI inference hardware. During deployment, compute utilization (and thus power consumption) can vary significantly across layers of an AI model, number of tokens, precision, and batch size [1]. Such wide variation, which may occur at fast time scales, poses unique challenges in optimizing performance within the system-level specifications for discrete accelerator cards, including not just average power consumption, but also peak instantaneous current draw, which may require consideration of time constants down to μs-scale [2]. Prior current-limiting systems [2], [3], which use reactive schemes and often target general-purpose processors, may not be sufficient for AI workloads. This work leverages the unique characteristic of AI workloads, which allows predictive compile-time software optimization and proposes a new power management architecture to minimize worst-case margins and realize the potential of AI accelerators. In addition, due to wide variation of power consumption across card components in AI workloads, sensing the card-level (vs. chip-level) current provides more opportunity for optimization. A new software-assisted feed-forward current-limiting scheme is thus proposed in conjunction with PCIe-card-level closed-loop control to maximize performance under sub-ms peak current constraints.

查看译文

关键词

Peak Current,Batch Size,Window Size,Power Consumption,State Control,Voltage Regulation,Closed-loop Control,Power Management,Fast Timescale,Optimal Opportunity,Average Power Consumption,P2 Region,Transformer Block,AI Models,High Duty Cycle,Feedforward Strategy

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要