Data-parallel irregular applications
Multicore systems are becoming increasingly prevalent. As a result, parallel programming is now a concern for more programmers than ever before. Unfortunately, writing correct, efficient parallel programs is a highly non-trivial task. This problem is exacerbated when the application to be parallelized is irregular (i.e., contains many pointer-based data structures). In fact, it has been an open question as to exactly how much parallelism is even available in irregular programs.
It is our belief that many irregular programs exhibit data parallelism. The programs perform computations on elements of worklists of various kinds. However, these worklists are irregular data structures, and the computations may traverse other irregular data structures as well. Thus, many of the techniques developed to handle data parallelism in regular (array- and matrix-based) programs are insufficient.
One of the fundamental obstacles when parallelizing irregular programs is the intractability of traditional parallelization analyses; the highly dynamic nature of these programs often makes any static approach to parallelization infeasible. Our belief is that parallelizing many irregular applications requires the use of optimistic parallelization, where work is executed speculatively in parallel, with the execution managed by a run-time to ensure correct behavior.
Parallelizing data-parallel irregular programs will require new programming models, compiler techniques and run-time systems. The Galois Project aims to provide these tools.
The Galois Project
Our work on the Galois project is underpinned by several key beliefs:
- Data-parallel irregular programs should have well defined sequential semantics. This makes it easier for programmers to reason about the correctness of their algorithms.
- The majority of the burden of parallelization and synchronization should be offloaded to the compiler and run-time systems.
- Object-oriented programs are rich in semantics which a compiler and run-time system should exploit in parallelization and synchronization.
- The program, when run in parallel, should approach the performance of hand-parallelized code.
Thus, the goal of this project is to develop a programming model which makes it easy to write data-parallel irregular programs, and then efficiently execute them on multicore processors. Our programming model consists of (i) two simple syntactic concurrency constructs, (ii) an object model which allows for increased concurrency during parallel execution and (iii) a run-time system which supports the concurrency constructs and object model.
Using the Galois system, we are able to parallelize a range of data-parallel irregular applications, achieving significant speedups on algorithms which appear, prima facie, to have too many sequential dependences to be parallelizable. A full treatment of the system can be found in the paper: Optimistic Parallelization Requires Abstractions. Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala and L. Paul Chew. PLDI 2007.
Ongoing research
Current avenues of research focus mainly on improving the efficiency of parallel execution. Our current projects include:
- Collecting more applications to create a benchmark suite of data-parallel irregular applications: There is a shortage of appropriate benchmarks for work in parallelism. We are not attempting to simply replace critical sections with optimistic concurrency constructs; we want to re-imagine the writing of parallel programs. What sorts of applications are suitable benchmarks for this work? What would hand-written parallel implementations look like, and how would they perform? Do these applications give us insights into further extensions to the Galois system?
- Improving the memory system performance of applications parallelized using the Galois system by exploiting locality: In many applications, the cost of ignoring locality begins to overwhelm the benefits of parallelization. Can we change the execution model to improve locality? Can we change the object model? What is the appropriate "sweet spot" between parallelism and locality?
- Improving the expressibility of the Galois system: In a data-parallel program, the possible ways to schedule parallel execution are legion. Different schedules of execution can have exhibit wildly different performance, and have far reaching effects on algorithmic performance, concurrency, locality, overheads and load-balancing. What sorts of schedules are useful for data-parallel applications? Can we provide a simple conceptual model that remains expressive enough to capture intricate scheduling decisions?
- Justifying optimistic parallelization: Speculative parallelization is only useful if the optimism is warranted and most code executed speculatively performs useful work. What sorts of techniques do we need to make the speculation in the Galois system as accurate as possible?
- Reducing the overhead of the Galois system through compiler optimizations: What kinds of optimizations can we perform to reduce the overhead of parallel execution? Can we reduce synchronization overheads? Can we leverage dynamic optimizations such as profile-based inlining to improve the performance of common-case scenarios?
Publications:
- Lonestar: A Suite of Parallel Irregular Programs International Symposium on Performance Analysis of Software and Systems (ISPASS), April, 2009
- How Much Parallelism is There in Irregular Applications? Principles and Practices of Parallel Programming (PPoPP), February, 2009
- Fast Agglomerative Clustering for Rendering Symposium on Interactive Ray Tracing (RT), August, 2008
- On the Scalability of an Automatically Parallelized Irregular Application 21st International Workshop on Languages and Compilers for Parallel Computing (LCPC), July, 2008
- Scheduling Strategies for Optimistic Parallel Execution of Irregular Programs 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), June, 2008
- Optimistic Parallelism Benefits From Data Partitioning Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2008
- Optimistic Parallelism Requires Abstractions Programming Languages Design and Implementation (PLDI), June 2007
- Using Transactions in Delaunay Mesh Generation Workshop on Transactional Memory Workloads (WTW), 2006





