A Work-Efficient Parallel Breadth-First Search Algorithm (Or How To Cope With The Nondeterminism Of Reducers)

SPAA(2010)

引用 270|浏览318
暂无评分
摘要
We have developed a multithreaded implementation of breadth-first search (BFS) of a sparse graph using the Cilk++ extensions to C++. Our PBFS program on a single processor runs as quickly as a standard C++ breadth-first search implementation PBFS achieves high work-efficiency by using a novel implementation of a multiset data structure, called a "bag," in place of the FIFO queue usually employed in serial breadth-first search algorithms For a variety of benchmark Input graphs whose diameters are significantly smaller than the number of vertices a condition met by many real-world graphs PBFS demonstrates good speedup with the number of processing cores.Since PBFS employs a nonconstant-time "reducer" a "hyper-object" feature of Cilk++ the work inherent in a PM execution depends nondeterministically on how the underlying work-stealing scheduler load-balances the computation We provide a general method for analyzing nondeterministic programs that use reducers PBFS also is nondeterministic in that it contains benign races which affect its performance but not its correctness Fixing these races with mutual-exclusion locks slows down PBFS empirically, but it makes the algorithm amenable to analysis In particular, we show that for a graph G = (V, E) with diameter D and bounded out-degree, this data-race-free version of PBFS algorithm runs in time O((V + E)/P + Dlg(3) (V/D)) on P processors, which means that it attains near-perfect linear speedup if P << (V + E)I Dlg(3) (V/D).
更多
查看译文
关键词
Breadth-first search,Cilk,graph algorithms,hyperobjects,multithreading,nondeterminism,parallel algorithms,reducers,work-stealing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要