Connected Components

Application Description

A connected component of an undirected graph is a subgraph in which there is a path between any two nodes. A node with no edges is itself a connected component.

Application Description

Our distributed implementation of connected components is a bulk-synchronous parallel (BSP) label propagation algorithm: all nodes are initialized to a label of their own vertex ID, and in each round, the current ID is propagated to its neighbors. Nodes will overwrite their labels with the smallest vertex ID that it receives. This propagation continues until no node label changes in a round. We have a push variant and a pull variant of the algorithm.

The push-style version has a node push out its label only if it has changed from the label in the last round. The pull-style version has all nodes check their neighbors and take the smallest label from itself and its neighbors.

Psuedocode for the computation step of the 2 implementations follows below:

1 2 3 4 5 6 7 for (node n in graph) { if (n.label != n.old_label) { for (neighbor a of node n) { a.label = min(n.label, a.label) } } }

Figure 1: Pseudocode for CC Push computation

1 2 3 4 5 6 7 8 for (node n in graph) { for (neighbor a of node n) { if (a.label < n.label) { n.label = a.label } } } }

Figure 2: Pseudocode for CC Pull computation

Synchronization of the label variable occurs between BSP rounds. A node will take the minimum label value of all proxies that exist in the system for that node.

Performance

The graph below shows performance of cc-push on the TACC Stampede2 cluster's KNL nodes (Intel Xeon Phi 7250, 1.4 Ghz with 96GB DDR4 + 16 GB MCDRAM) running a total of 272 hardware threads from 1 to 256 hosts. We run on 3 graphs: rmat28 and kron30 are randomly generated graphs from the rmat and kron generator, respectively, and clueweb12 is one of the biggest publicly available web-crawl graphs. D-Galois numbers are compared the corresponding benchmark from Gemini, a state-of-the-art distributed graph analytics system.
Figure 3: Execution time of cc-push on Stampede2.