Threads vs. caches: Modeling the behavior of parallel workloads

Guz, Z., Itzhak, O.,Keidar, I.,Kolodny, A.

Computer Design（2010）

引用 25|浏览22

暂无评分

摘要

A new generation of high-performance engines now combine graphics-oriented parallel processors with a cache architecture. In order to meet this new trend, new highly- parallel workloads are being developed. However, it is often difficult to predict how a given application would perform on a given architecture. This paper provides a new model capturing the behavior of such parallel workloads on different multi-core architectures. Specifically, we provide a simple analytical model, which, for a given application, describes its performance and power as a function of the number of threads it runs in parallel, on a range of architectures. We use our model (backed by simulations) to study both synthetic workloads and real ones from the PARSEC suite. Our findings recognize distinctly different be- havior patterns for different application families and architec- tures. complex behavior, in turn, poses a challenge for architecture designers, who need to allocate the limited on-die resources to cores, thread contexts, and caches. Finally, given a diversi- ty of already available high-performance architectures, there is the question of which is the best fit for a given workload. This paper addresses these challenges by developing a simple, high-level, closed-form model that captures both the architecture and the application characteristics (see Section III). The modeled machine uses a parameterized combination of both mechanisms for memory latency masking, and can thus capture a range of machines, rendering the comparison between them meaningful. The workload model, in turn, cap- tures the salient properties of the program, which allows one to predict which architecture is most beneficial for it. All the parameters— capturing both architecture and workload— can be used as ''knobs'' for studying a wide range of scenarios, in order to comprehend the interplay among multiple parameters in a clean, qualitative way. The model thus serves as a vehicle to derive intuitions. In Section IV, we study how different properties of an ap- plication affect performance and power. We identify three families of workloads with distinct behavior patterns: While some workloads have a clear affinity towards either caching or multi-threading, others can benefit from both. Moreover, some workloads exhibit an unintuitive "valley" between the cache efficiency zone and the thread efficiency zone, where performance takes a dip. In Section V we back our analytical model by simulations. Our results indicate that the simple, closed-form model of Section III can, in most cases, predict dynamic behavior, and can thus be used to select the most efficient hardware struc- ture for executing a given program. Whereas Section IV con- centrates on synthetic workloads, Section V studies work- loads from the PARSEC benchmark suite (7), and shows that the three distinct behaviors observed in Section IV are indeed present in real workloads. To summarize, our contributions are as follows: • We present a simple closed-form model for systematical- ly reasoning about complex phenomena; the model cap- tures the behavior of parallel workloads on high perfor- mance engines that employ any combination of caching and aggressive multi-threading. • We conduct a qualitative study of the inherent tradeoffs between the two approaches for memory access mask- ing, and their sensitivity to a range of parameters. Our study yields non-intuitive observations regarding the impact of architectural choices and workloads characte- ristics on performance and power.

查看译文

关键词

cache storage,computer graphic equipment,coprocessors,parallel architectures,PARSEC suite,behavior modelling,cache architecture,graphics-oriented parallel processors,high-performance engines,multicore architectures,parallel workloads

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要