Minimum Spanning Tree

Problem Description

This benchmark computes a minimum spanning tree in a weighted undirected graph with by using Boruvka's algorithm.

Algorithm

The algorithm is implemented by successive edge-relaxations of the minimum weight edges. However, since an explicit edge-relaxation involves modifying the graph, the implementation performs edge-relaxation indirectly. This is done by keeping track of the set of nodes that have been merged, called components, which avoids modifications to the graph. Each component's size grows in each iteration, while the number of components reduces (due to components getting merged). The algorithm terminates when only one component remains. When the input graph is disconnected, the algorithm terminates when the number of components does not change in an iteration and it computes a minimum spanning forest.


Figure: Before edge-contraction and after edge-contraction

The computation is split across a sequence of kernels as shown in the code below.

do {
  find the minimum weight edge out of each node;
  find the minimum weight edge out of each component;
  find the merge partner;
  merge a component with its partner;
} while the number of components changes;
The body of the loop is implemented as separate kernels. The kernels are repeatedly called from the host until there is no change in the number of components.

The mst variant iterates over the entire graph, while the mst-da version uses worklists for parts of the computation (and is thus more work-efficient and faster).

NOTE: Only use the symmetric graph files (.sym.gr) as inputs.

Performance

The benchmark is written in CUDA.
Compile as: make
Execute as: mst <graphfile>
Or as: mst-da <graphfile>
e.g., mst USA-road-d.FLA.sym.gr

e.g., mst-da USA-road-d.FLA.sym.gr

GraphTime (ms)
FLA.sym703.86
rmat12.sym9.02
2d-2e20.sym536.55

The above performance is observed on a 1.45 GHz Quadro 6000 with 6 GB of main memory and 448 cores distributed over 14 SMs. It has 64 kB of fast memory per SM that is split between the L1 data cache and the shared memory.