Breadth-first Search

Application Description

This benchmark computes the shortest path from a source node to all nodes in a directed, unweighted graph by using a modified Bellman-Ford algorithm [1].

[1] http://en.wikipedia.org/wiki/Bellman-Ford_algorithm

Algorithm

This algorithm computes the solution to the single-source shortest path problem with unit edge weights using a demand-driven modification of the Bellman-Ford algorithm. Each node maintains an estimate of its shortest distance from the source called dist. Initially, this value is infinity for all nodes except for the source, whose distance is 0. The algorithm proceeds by iteratively updating distance estimates starting from the source and maintaining a worklist of nodes whose distances have changed and thus may cause other distances to be updated. Figure 1 gives the pseudocode for the algorithm.

In principle, one can directly apply any general single-source shortests paths (SSSP) algorithm to solve this problem, but there are some optimizations that are useful to apply in this specific context.

The general SSSP algorithm above may add multiple items to the worklist that update the same node to the same distance value (so-called empty work). The amount of empty work can be quite large, so it pays to avoid adding it if possible. The algorithm in Figure 1 does this by first checking if a node needs to be added before adding it to the worklist (line 10). For simplicity, the general SSSP algorithm above skips this step.

If the scheduler always processes all the work at the current distance level before moving on to the next distance (i.e., bulk-synchronous scheduling), updates to the distance of neighbors can be made without synchronization because all updates within a distance level will be the same.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Graph g = /* read input */; Workset ws = new Workset(); source.dist = 0; foreach (Node n : source.getNeighbors()) { n.dist = 1; ws.add((n, 2)); } foreach (Pair<Node,int> (u, d) : ws) { for (Node n: u.getNeighbors()) { if (n.dist <= d) continue; n.dist = d; ws.add((n, d + 1)); } }

Figure 1: Pseudocode for breadth-first search algorithm.

Data Structures

There are two key data structures used in this algorithm:

Unordered Set
The worklist containing the items that need to be processed is an unordered set (since items can be processed in any order).
Directed Graph
The algorithm successively reads and updates a graph.

Caveats

This implementation breaks from the Galois model by atomically updating the distance value in nodes rather than taking a lock on the node, and when bulk-synchronous scheduling is used, distance values are updated without any synchronization.

Parallelism

The parallelism in this algorithm is very data dependent.

Performance

Figure 3: Performance results on a random graph of 2^26 nodes and 4 times that many edges. Also shown are the parallel implementations from Schardl et al. and from the PBBS (v0.1) benchmark suite.

Machine Description

Performance numbers are collected on a 4 package, 10 cores per package, Intel Xeon E7-4860 machine at 2.27GHz. Thread numbers above 40 use SMT. The machine has 128GB of ram with 20GB dedicated to 2M hugepages used by the Galois allocator. The operating system is Ubuntu Linux 10.04 LTS (Linux 2.6.32). All runs of the Galois benchmarks used gcc/g++ 4.7 to compile with a patch to the C++ runtime to improve the scalability of exception handling. All other implementations were compiled with Intel ICC 12.1 with thread parallelization provided by Intel Cilk.