Single-Source Shortest Paths

Problem Description

This benchmark computes the shortest path from a source node to all nodes in a directed graph with non-negative edge weights by using a modified Bellman-Ford algorithm [1,2].



The computation is implemented in a topology-driven manner; all the graph edges are tried for relaxation in every iteration and another iteration is executed if at least one edge was relaxed in the previous iteration. This continues until a fixed-point.
  for all nodes src {
    dsrc = dist[src];
    for each edge <src, dst, wt> {
      ddst = dist[dst];
      if ddst > dsrc + wt {
         dist[dst] = dsrc + wt;
         anotheriteration = true;
The body of the outer for loop is implemented as the kernel. The kernel is repeatedly called from the host until a fixed-point is reached.


The benchmark is written in CUDA.
Compile as: make
Execute as: sssp-wlc <graphfile>
e.g., sssp-wlc

GraphTime (ms)

The above performance is observed on a 1.45 GHz Quadro 6000 with 6 GB of main memory and 448 cores distributed over 14 SMs. It has 64 kB of fast memory per SM that is split between the L1 data cache and the shared memory.