# Survey Propagation

## Application Description

This benchmark implements Survey Propagation, a heuristic
SAT-solver based on Bayesian inference [1]. The algorithm
represents the Boolean formula as a *factor graph*,
which is a bipartite graph with variables on one side and
clauses on the other. An edge connects a variable to a
clause if the variable participates in the clause. The
edge is given a value of -1 if the literal in the clause
is negated, and +1 otherwise. The general strategy of
Survey Propagation is to iteratively update each variable
with the likelihood that it should be assigned a truth
value of *true* or *false*. The data parallelism
in this algorithm arises from the worklist of all the
nodes (both variables and clauses) that need to be processed.

[1] A. Braunstein, M. Mezard, and R. Zecchina. Survey Propagation: An Algorithm for Satisfiability. Random Structures and Algorithms, 27:201-226, 2005.

## Algorithm

The algorithm proceeds as follows (pseudocode is provided in Figure 1). Each phase of the algorithm first iterates over the variables and clauses of the formula updating 'surveys' until all updates are below some small epsilon. Then, the surveys are processed to find the most biased variables which are fixed to a specific value. The graph is then reduced by removing fixed variables. If only trivial surveys exist, the problem is passed on to a simpler solver. If remaining terms exist, a new phase is started.

The termination condition for Survey Propagation is fairly complex: when the number of variables is small enough, the Survey Propagation iterations are terminated, and the remaining problem is solved using a local heuristic such as WalkSAT. Alternatively, if there is no progress after some number of iterations, the algorithm may just give up.

The benchmark performs as many phases of the algorithm as necessary until no more variables can be fixed.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | FactorGraph f = /* read initial formula */; do { Workset ws = new Workset(); ws.addAll(Variables); foreach (Variable v : ws) { foreach (Clause c : f.getNeighbors(v)) { update_eta_and_bias(v, c); if (/* eta changed more than epsilon */) { ws.push(f.getNeighbors(f.getNeighbors(v))); } } } foreach (Variable v : Variables) { if (/* bias greater than threshold bias */) { // fix v and remove from graph } } } while (/* has interesting surveys */); |

Figure 1: Pseudocode for Survey Propagation.

## Data Structures

There are two key data structures used in Survey Propagation:

- Unordered Set
- Because the algorithm is based on iteratively updating the values of variables chosen at random, the worklist can be represented as an unordered set. There are no ordering constraints on processing the elements of the worklist (although, as discussed above, different orders of processing may lead to different algorithmic behavior).
- Factor Graph
- The bipartite graph representing the Boolean formula.

## Parallelism

The active nodes in Survey Propagation are variable nodes of the factor graph. Initially every variable node is active. As nodes are processed, if the surveys on edges change, the nodes' neighbors' neighbors (other variables in common clauses) are added to the worklist. The neighborhood of each active node is the neighbors and neighbors' neighbors.

Figures 2 and 3 shows the time and speedup of Survey Propagation for a random input consisting of 1000000 variables and 3500000 clauses. Each iteration of survey propagation touches a single node and its neighbors and neighbors' neighbors in the factor graph. Iterations conflict with one another if those neighborhoods overlap. The structure of the graph is largely constant, except for occasionally removing nodes when they are stable. Thus, the available parallelism reflects the connectivity of the graph and remains roughly constant, dropping occasionally as nodes are removed from the graph.

Since Survey Propagation is a heuristic to solve SAT, it is non-deturministic and direct comparisons to full SAT solvers would not be appropriate.

## Performance

Figure 3: Performance results on a random 3-SAT instance with 1000000 variables and 3 times as many clauses (the first parameter is the random seed).

## Machine Description

Performance numbers are collected on a 4 package, 10 cores per package, Intel Xeon E7-4860 machine at 2.27GHz. Thread numbers above 40 use SMT. The machine has 128GB of ram with 20GB dedicated to 2M hugepages used by the Galois allocator. The operating system is Ubuntu Linux 10.04 LTS (Linux 2.6.32). All runs of the Galois benchmarks used gcc/g++ 4.7 to compile with a patch to the C++ runtime to improve the scalability of exception handling. All other implementations were compiled with Intel ICC 12.1 with thread parallelization provided by Intel Cilk.