Dyrs: Bandwidth-Aware Disk-To-Memory Migration Of Cold Data In Big-Data File Systems

Simbarashe Dzinamarira, Florin Dinu, T. S. Eugene Ng

2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019)（2019）

引用 1|浏览17

暂无评分

摘要

Migrating data into memory can significantly accelerate big-data applications by hiding low disk throughput. While prior work has mostly targeted caching frequently used data, the techniques employed do not benefit jobs that read cold data. For these jobs, the file system has to pro-actively migrate the inputs into memory. Successfully migrating cold inputs can result in a large speedup for many jobs, especially those that spend a significant part of their execution reading inputs.In this paper, we use data from the Google cluster trace to make the case that the conditions in production workloads are favorable for migration. We then design and implement DYRS, a framework for migrating cold data in big-data file systems. DYRS can adapt to match the available bandwidth on storage nodes, ensuring all nodes are fully utilized throughout the migration. In addition to balancing the load, DYRS optimizes the placement of each migration to maximize the number of successful migrations and eliminate stragglers at the end of a job.We evaluate DYRS using several Hive queries, a trace-based workload from Facebook, and the Sort application. Our results show that DYRS successfully adapts to bandwidth heterogeneity and effectively migrates data. DYRS accelerates Hive queries by up to 48%, and by 36% on average. Jobs in a trace-based workload experience a speedup of 33% on average. The mapper tasks in this workload have an even greater speedup of 46%. DYRS accelerates sort jobs by up to 20%.

查看译文

关键词

filesystem,hadoop distributed file system,storage,heterogeneity,migration

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要