Single-Source Shortest Paths

Problem Description

This benchmark computes the shortest path from a source node to all nodes in a directed graph with non-negative edge weights by using a modified Bellman-Ford algorithm [1,2].

[1] http://en.wikipedia.org/wiki/Bellman-Ford_algorithm
[2] http://www.walden-family.com/public/bf-history.pdf

Algorithm

The computation is implemented in a topology-driven manner; all the graph edges are tried for relaxation in every iteration and another iteration is executed if at least one edge was relaxed in the previous iteration. This continues until a fixed-point.
  for all nodes src {
    dsrc = dist[src];
    for each edge <src, dst, wt> {
      ddst = dist[dst];
      if ddst > dsrc + wt {
         dist[dst] = dsrc + wt;
         anotheriteration = true;
      }
    }
  }
The body of the outer for loop is implemented as the kernel. The kernel is repeatedly called from the host until a fixed-point is reached.

Performance

The benchmark is written in CUDA.
Compile as: make
Execute as: sssp-wlc <graphfile>
e.g., sssp-wlc USA-road-d.NY.gr

GraphTime (ms)
rmat22649.61
USA126655.91
random23265.95

The above performance is observed on a 1.45 GHz Quadro 6000 with 6 GB of main memory and 448 cores distributed over 14 SMs. It has 64 kB of fast memory per SM that is split between the L1 data cache and the shared memory.