Understanding Capacity-Driven Scale-Out Neural Recommendation Inference

Michael Lui,Yavuz Yetim,Özgür Özkan,Zhuoran Zhao,Shin-Yeh Tsai,Carole-Jean Wu,Mark Hempstead

2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)（2021）

引用 28|浏览56

暂无评分

摘要

Deep learning recommendation models have grown to the terabyte scale. Traditional serving schemes–that load entire models to a single server–are unable to support this scale. One approach to support these models is distributed serving, or distributed inference, which divides the memory requirements of a single large model across multiple servers. This work is a first-step for the systems community...

查看译文

关键词

Deep learning,Training,Data centers,Computational modeling,Memory management,Software,Performance analysis

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要