Understanding Capacity-Driven Scale-Out Neural Recommendation Inference
2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)(2021)
摘要
Deep learning recommendation models have grown to the terabyte scale. Traditional serving schemes–that load entire models to a single server–are unable to support this scale. One approach to support these models is distributed serving, or distributed inference, which divides the memory requirements of a single large model across multiple servers. This work is a first-step for the systems community...
更多查看译文
关键词
Deep learning,Training,Data centers,Computational modeling,Memory management,Software,Performance analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要