Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective

Kim M. Hazelwood,Sarah Bird,David M. Brooks,Soumith Chintala,Utku Diril,Dmytro Dzhulgakov, Mohamed Fawzy,Bill Jia,Yangqing Jia,Aditya Kalro,James Law,Kevin Lee,Jason Lu,Pieter Noordhuis,Misha Smelyanskiy,Liang Xiong,Xiaodong Wang

2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)（2018）

引用 704|浏览679

暂无评分

摘要

Machine learning sits at the core of many essential products and services at Facebook. This paper describes the hardware and software infrastructure that supports machine learning at global scale. Facebook's machine learning workloads are extremely diverse: services require many different types of models in practice. This diversity has implications at all layers in the system stack. In addition, a sizable fraction of all data stored at Facebook flows through machine learning pipelines, presenting significant challenges in delivering data to high-performance distributed training flows. Computational requirements are also intense, leveraging both GPU and CPU platforms for training and abundant CPU capacity for real-time inference. Addressing these and other emerging challenges continues to require diverse efforts that span machine learning algorithms, software, and hardware design.

查看译文

关键词

computer architecture,hardware software codesign,machine learning,facebook

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要