Find the best scheduling class for your algorithm. For each loops take a template parameter to choose the runtime scheduler. If your algorithm has a notion of priority, use a priority-based scheduler.
Many of the Galois class methods take an additional method flags argument that determines what additional work the runtime performs in order to maintain transactional semantics. By default, all sufficient actions are performed, but in many cases, these actions are redundant given the current context of the iteration. In these cases, users can override the default actions by passing explicit flags indicating exactly what actions need to be performed.
The Galois benchmarks and the graph classes use a custom allocator which takes advantage of huge pages. If you add nodes to a graph in the parallel region, you will not scale without huge pages. The Linux kernel protects all page faults and mmap calls with a single spinlock which does not scale, so if you cause allocation, hugepages will amortize that spinlock.
Per Iteration Allocator
Galois includes a region allocator whose region is an iteration. This is the fastest way to allocate per-iteration temporaries (vectors, sets, etc). Standard library adaptors are included for use in any container which takes standard allocator classes.
Galois makes use of knowledge of the topology and memory hierarchy of the
machine. It does not yet automatically detect this at runtime. Therefore
you should define a machine policy in
to describe your machine.