Points-to Analysis

Problem Description

Given a set of points-to constraints, the problem is to compute the points-to information for each pointer, in a flow-insensitive context-insensitive manner.

Algorithm

The computation is implemented in a topology-driven manner; all the graph edges are tried for propagation in every iteration and another iteration is executed if at least one edge was active in the previous iteration. This continues until a fixed-point. The following pseudo-code shows how points-to information is propagated.
  for all pointer nodes src {
    ptasrc = pointsto[src];
    for each edge <src, dst> {
      ptadst = pointsto[dst];
      if ptadst \ ptasrc ≠ ∅ {
         pointsto[dst] = pointsto[dst] ∪ pointsto[src];
         anotheriteration = true;
      }
    }
  }
The body of the outer for loop is implemented as the kernel. The kernel is repeatedly called from the host until a fixed-point is reached.

Optimizations

The computation implements several optimizations.
  • Coalesced memory: The points-to information is stored as sparse bit vectors of 32 words ensuring perfect coalescing while performing the union of points-to sets.
  • Pull-based processing: Each thread operates on a node and updates its points-to information. This helps in avoiding synchronization during the update as only one thread writes to a node's points-to set.
  • Warp-based execution: Work is assigned to warps rather than to individual threads to exploit GPU's warp-based processing.

Performance

The benchmark is written in CUDA.
Compile as: make
Execute as: run.sh /path/to/files/prefix
e.g., run.sh ../../inputs/ex

InputTime (ms)
vim5,230
pine6,280
tshark1,830

The above performance is observed on a 1.45 GHz Quadro 6000 with 6 GB of main memory and 448 cores distributed over 14 SMs. It has 64 kB of fast memory per SM that is split between the L1 data cache and the shared memory.