# Single-Source Shortest Path

## Application Description

This benchmark computes the shortest path from a source node to all nodes in a directed graph with non-negative edge weights by using a modified Bellman-Ford algorithm [1,2].

[1] http://en.wikipedia.org/wiki/Bellman-Ford_algorithm

[2] http://www.walden-family.com/public/bf-history.pdf

## Algorithm

This algorithm computes the solution to the single-source
shortest path problem with non-negative edge weights using
a demand-driven modification of the Bellman-Ford algorithm.
Each node maintains an estimate of its shortest distance
from the source called *dist*. Initially, this value
is infinity for all nodes except for the source, whose
distance is 0. The algorithm proceeds by iteratively
updating distance estimates starting from the source and
maintaining a worklist of nodes whose distances have
changed and thus may cause other distances to be updated.
Figure 1 gives the pseudocode for the algorithm:

1 2 3 4 5 6 7 8 9 10 | Graph g = /* read input */; Workset ws = new Workset({source}); foreach (Node n : ws) { for (Node neighbor : n.getNeighbors()) { if (n.dist + g.getEdgeWeight(n, neighbor) < neighbor.dist) { neighbor.dist = n.dist + g.getEdgeWeight(n, neighbor); ws.add(neighbor); } } } |

Figure 1: Pseudocode for Single-Source Shortest Path algorithm.

## Data Structures

There are two key data structures used in this shortest path algorithm:

- Priority Scheduler
- Although tasks can be processed in any order, processing tasks in ascending distance order reduces the total amount of work that needs to be done.
- Directed Graph
- The algorithm successively reads and updates a graph.

## Demo

For this algorithm, we can classify work into three categories. First,
*good work* is
an iteration than lowers the distance value of a node to its final value.
Second, *empty work* is an iteration that is asked to lower a distance
but
that distance value has already been lowered. Empty work occurs because
concurrent priority schedulers often do not allow updating the priority of a
task after it has been added to the scheduler, so there may be multiple tasks
that update the same node. For sssp, a sequential schedule that always applies
the least value update will only have good and empty work, and a scheduler that
allows the priority of a task to be updated after the task is added will only
have good work. When the least value update is not always applied, there is a
third type of work that can arise, *bad work*. Bad work is when an
iteration lowers the distance of a node to a non-final value.

The demo below shows the breakdown of work as schedulers change.
The amount of good work should be the same across
schedulers for the same input, but the amount of bad and empty work varies
according to the particular scheduler used. The *FIFO* scheduler is just
a queue. The *Priority* scheduler does the work with the least distance
label first. The *Local Priority* scheduler has a priority queue
per thread, with new work added to the thread that generated it. The demo
is not multithreaded; threads are simulated by round-robin scheduling amongst
the "per-thread" queues. The *Partitioned Priority* scheduler is similar
to the local priority one except that new work is assigned to a predetermined
thread. For more details, see the accompanying tech report.

## Caveats

This implementation breaks from the Galois model by atomically updating the distance value in nodes rather than taking a lock on the node.

## Parallelism

The parallelism in this algorithm is very data dependent, Figure 2 shows the parallelism profile for a sample input drawn from a road map of Rome.

Figure 2: Available parallelism in Single-Source Shortest Path algorithm.

## Performance

Figure 3: Performance results on a random graph of 2^26 nodes and 4 times that many edges.

## Machine Description

Performance numbers are collected on a 4 package, 10 cores per package, Intel Xeon E7-4860 machine at 2.27GHz. Thread numbers above 40 use SMT. The machine has 128GB of ram with 20GB dedicated to 2M hugepages used by the Galois allocator. The operating system is Ubuntu Linux 10.04 LTS (Linux 2.6.32). All runs of the Galois benchmarks used gcc/g++ 4.7 to compile with a patch to the C++ runtime to improve the scalability of exception handling. All other implementations were compiled with Intel ICC 12.1 with thread parallelization provided by Intel Cilk.