Independent Set

Application Description

An independent set Sof a graph G = (V,E) is a subset of nodes S ⊆ V such that no two nodes share the same edge. Finding the independent set with maximum cardinality |S| is called the maximum independent set problem. Finding such a set in an arbitrary graph is NP-complete.

The problem of finding maximal independent sets is simpler and more useful in practice for large graphs. A maximal independent set is an independent set that cannot be extended with another node and still remain an independent set.

Algorithm

A simple greedy algorithm solves the maximal independent set problem. Figure 1 gives one implementation.

Each node in the graph has a flag that may be in one of three states: Unmatched, Matched and NeighborMatched. All the flags begin in the Unmatched state. An unmatched node u is selected from the graph. If none of its neighbors are matched, then the flag for u is set to matched and all of its neighbors' flags are set to NeighborMatched. This process continues until there are no more unmatched nodes, in which case, the nodes with matched flags are a maximal independent set.

1 2 3 4 5 6 7 8 9 10 11 12 Graph g = /* read input */; L1: foreach (Node u : g) { if (u.flag != Unmatched) continue; for (Node v : n.getNeighbors()) { if (v.flag == Matched) continue L1; u.flag = Matched; for (Node v : u.getNeighbors()) v.flag = NeighborMatched; } }

Figure 1: Pseudocode for independent set algorithm.

Data Structures

There is one key data structure used in this algorithm:

Directed Graph
The algorithm successively reads and updates a graph.

Parallelism

The parallelism in this algorithm is very data dependent.

Performance

Figure 3: Performance results on a random graph of 2^26 nodes and 4 times that many edges. Also shown is the performance of the implementation from the PBBS (v0.1) benchmark suite.

Machine Description

Performance numbers are collected on a 4 package, 10 cores per package, Intel Xeon E7-4860 machine at 2.27GHz. Thread numbers above 40 use SMT. The machine has 128GB of ram with 20GB dedicated to 2M hugepages used by the Galois allocator. The operating system is Ubuntu Linux 10.04 LTS (Linux 2.6.32). All runs of the Galois benchmarks used gcc/g++ 4.7 to compile with a patch to the C++ runtime to improve the scalability of exception handling. All other implementations were compiled with Intel ICC 12.1 with thread parallelization provided by Intel Cilk.