Understanding Capacity-Driven Scale-Out Neural Recommendation Inference

2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)(2021)

引用 28|浏览56
暂无评分
摘要
Deep learning recommendation models have grown to the terabyte scale. Traditional serving schemes–that load entire models to a single server–are unable to support this scale. One approach to support these models is distributed serving, or distributed inference, which divides the memory requirements of a single large model across multiple servers. This work is a first-step for the systems community...
更多
查看译文
关键词
Deep learning,Training,Data centers,Computational modeling,Memory management,Software,Performance analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要