Stochastic Gradient Descent

Problem Description

This benchmark implements Stochastic Gradient Descent (SGD). In particular, the benchmark uses SGD to complete unknown entries of a sparse matrix R. The sparse matrix represents a bipartite graph, with one set of nodes represent movies, while the other set represents users. The edge connecting a movie node to a user node denotes that the user has rated the movie, with the edge label representing the rating assigned.

The goal of the benchmark is to build two matrices, A (1xm) and B (1xu), such that the matrix product AB approximately equals R, the original ratings matrix described above. Each entry of A and B is a vector of real numbers and is called a latent vector.

After A and B have been calculated, they can be used to predict unknown entries of R.

Algorithm

First, the entries of the latent vector are initialized to random numbers. Then the ratings are normalized.

Then for every edge in the graph R, we apply stochastic gradient descent as follows:

function gradient_update(edge) {
  Mlv = edge.src.lv
  Ulv = edge.dst.lv

  diff = edge.rating - dotproduct(Mlv, Ulv) 

  update(Mlv, diff)
  update(Ulv, diff)
}

The key problem in a parallel implementation is to ensure that the gradient_update procedure has exclusive access to the end-points of the edge. Although this is conceptually simple, the benchmarks show that different locking strategies lead to different schedules, which in turn lead to very wide differences in performance. This is especially the case when applied to scale-free graphs.

For more information on this benchmark, see:

Rashid Kaleem, Anand Venkat, Sreepathi Pai, Mary Hall, Keshav Pingali, "Synchronization Trade-offs in GPU implementations of Graph Algorithms", IPDPS, 2016.

The benchmark is set to run for 5 rounds. This benchmark has rough correspondence to the GPU implementations described in the paper above.

Performance

GraphTime (ms)
Netflix5000

The benchmark is written in CUDA.
Compile as: make
Execute as: blk_sgd <bipartite-graph-input>
e.g., blk_diag bgg.gr

The above performance is observed on a 1733.500 Mhz GeForce GTX 1080 6000 with 8119 MB of main memory.