LATOA: Load-Aware Task Offloading and Adoption in GPU

15TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPU, GPGPU 2023(2023)

引用 0|浏览2
暂无评分
摘要
The emerging new applications, such as data mining and graph analysis, demand extra processing power at the hardware level. Conventional static task scheduling is no longer able to meet the requirements of such complicated applications. This inefficiency is a major concern when the application is supposed to run on a Graphics Processing Unit (GPU), where millions of instructions should be distributed among a limited number of processing cores. A non-optimal scheduling strategy leads to unfair load distribution among the GPU's processing cores. Consequently, while busy cores are stalled due to the lack of resources, waiting for their data from the main memory, other cores are idle, waiting for busy cores to complete their tasks. Our study introduces LATOA, a Load-Aware Task Offloading and Adoption method that tackles this problem by reducing both stall and idle cycles. LATOA is the first study moving from static to dynamic task scheduling based on run-time information obtained from the Miss Status Holding Register (MSHR) tables. In LATOA, all processing cores are dynamically tagged with critical, neutral, or relaxed states. Then, irregular warps with low locality properties are detected and offloaded from critical cores (going to the stall state) to relaxed ones (going to the idle state). Based on our experiments, LATOA reduces the number of stall cycles on average by 24% and increases the neutral states on average by 38%. In addition, with negligible hardware overhead, LATOA improves system performance and power efficiency on average by 26% and 7%, respectively.
更多
查看译文
关键词
GPU,cache access,offloading,adoption,unbalanced load distribution,irregularity,locality,stall cycle
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要