Running LonestarGPU 2.0 on GPGPU-sim 3.2.1

LonestarGPU 2.0 requires CUDA 5.0+ because dependency CUB requires a modern C++ compiler. GPGPU-sim v3.2.1 (at the time of writing) only supports CUDA 4.0. Thus, you cannot run LonestarGPU 2.0 directly on GPGPU-sim. This page documents how to get LonestarGPU 2.0 running on GPGPU-sim with minimum modifications to the benchmarks and GPGPU-sim.

NOTE: This is unsupported. Always check the results of the benchmarks to ensure that they have run correctly. If you encounter issues, please e-mail the maintainer.

Modifications to GPGPU-sim

You will need to compile GPGPU-sim with CUDA 5.5. Then you'll need to shim the CUDA 5.5 utilities cuobjdump and ptxas so that GPGPU-sim's parsers will continue to work. This patch does all that.

Next, you'll need to patch the implementation of cudaGetDeviceProperties to provide information (such as number of multiprocessors) that many benchmarks require and also correct the maximum dimensions of thread blocks that GPGPU-sim reports. This patch does that.

Third, texture naming seems to have changed in CUDA 5.5, so you'll need to update lookup functions using this patch.

Finally, GPGPU-sim's implementation of cudaFuncGetAttributes() is incomplete -- it sets maxThreadsPerBlock in the returned cudaFuncAttributes structure to 0 which causes SIGFPEs in Thrust code used by pta. This patch fixes that and also handles .maxnt_id annotations in the PTX source.

Apply the four patches linked above to a clean v3.2.1 tree (cc61b09b46c25c1ce491bf80808b9c6fe6cd0b7e) using git apply in the v3.x directory.

Then point $CUDA_INSTALL_PATH to the directory containing CUDA 5.5 and recompile gpgpu-sim.

Modifications to the LonestarGPU benchmark suite

You must use CUB 1.1.1, later versions use instructions not supported by GPGPU-sim.

CUDA 5.5 uses static linking of the CUDA Runtime by default, add the flag -cudart shared to the FLAGS variable in $LSGDIR/apps/common.mk like this:

FLAGS := -cudart shared -O3 -arch=$(COMPUTECAPABILITY) -g -Xptxas -v #-lineinfo -G

Comment out the following line in main() function of $LSGDIR/apps/sssp/main.cu:

  // cudaFuncSetCacheConfig(drelax, cudaFuncCachePreferShared);

This is an incorrect function call -- drelax is a device function in LSG 2.0 -- that crashes GPGPU-sim otherwise.

Now recompile LSG2.0 and run gpgpgu-sim as usual.

Status

  1. bfs-wlc, sssp-wlc, nsp work
  2. mst does not seem to work (it does not terminate)
  3. dmr has not been tested
  4. bh should compile and work with CUDA 4.0
  5. pta should compile and work with CUDA 4.0 but does not seem to terminate