MixRec: Orchestrating Concurrent Recommendation Model Training on CPU-GPU platform

2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD(2023)

引用 0|浏览6
暂无评分
摘要
The development of deep learning recommendation models (DLRM) and recommendation systems has significantly improved the precision of information matching. Due to distinct computation, data access, and memory usage characteristics of recommendation models, they may suffer from low resource utilization on prevalent heterogeneous CPU-GPU hardware platforms. Existing concurrent training solutions cannot be directly applied to DLRM due to various factors, such as insufficient fine-grained memory management and the lack of collaborative CPU-GPU scheduling. In this paper, we introduce MixRec, a scheduling framework that addresses these challenges by providing an efficient job management and scheduling mechanism for DLRM training jobs on heterogeneous CPU-GPU platforms. To facilitate training co-location, we first estimate the peak memory consumption of each job. Additionally, we track and collect resource utilization for DLRM training jobs. Based on the information of resource usage, a batched job dispatcher with dynamic resource-complementary scheduling policy is proposed to co-locate DLRM training jobs on CPU-GPU platform. Experimental results demonstrate that our implementation achieved up to 4.42x higher throughput and 3.97x higher resource utilization for training jobs involving various recommendation models.
更多
查看译文
关键词
Scheduling,Recommendation,Concurrent Training,CPU/GPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要