PASL: Parallel Algorithm Scheduling Library

I have worked on scheduling techniques for multicore programming, together with Umut Acar and Mike Rainey, as well as Filip Sieczkowski, Adrien Guatto, and Vitaly Aksenov. Our main results can be summarized as follows.
  • Work stealing with private deques. Work stealing is a standard technique for executing programs featuring the language constructs parallel-for, fork-join, and async-finish. Traditionally, it is implemented using a fancy concurrent data structure called Chase-Lev deques. We show that work stealing can be implemented using a very simple message-passing protocol, through shared memory, and nevertheless deliver similar performance. We present both a theoretical analysis and results on standard benchmarks.
  • Parallel graph traversal. We show that the flexibility offered by private deques enable more advanced forms of work sharing, such as split-half policies. In particular, split half is critically useful to implement a parallel graph traversal. Concretely, each core maintains a set of edges to visit. At any time, a core may split its set of edges and share one half with another core. This splitting can be implemented in logarithmic time thanks to an advanced chunked sequence data structure that we have designed.
  • Granularity control using asymptotic complexity functions. A major challenge with work-stealing style computation is to control the size of the subtasks: if tasks are too smalll, then the overheads of task-creation are prohibitive; but if tasks are too large, then parallelism fades. We show that by requiring the programmer to provide a few annotations describing asymptotic costs, and by instrumenting a runtime system to measure execution times and deduce constant factors, we can generate subtasks that approximately of the ideal size. Again, we have proved our approach to work both in theory and in practice.
  • Heartbeat scheduling. We present a technique for controlling granularity in situations where the asymptotic execution time of functions cannot be guessed from their arguments. Our approach is based on an instrumentation of the call stack. Periodically, branches of potential parallel calls at the top of the stack are promoted as tasks. This strategy delivers provably good results.
  • DAG-calculus. We show that all the classical language constructs for parallelism (including parallel-for, fork-join, async-finish, strict and lazy futures, etc.) can be encoded in terms of a handful of low-level primitives. Exploiting these encodings allows for simpler implementations of practical runtime systems for parallel computations.

Download

For links to the source code, please visit the DeepSea website.
Please don't hesitate to contact Mike Rainey and myself if you are interested in using this code.

Related papers

Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey
PPoPP: Symposium on Principles and Practice of Parallel Programming, Febuary 2019
Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski
PLDI: Programming Language Design and Implementation, June 2018
Umut A. Acar, Arthur Charguéraud, and Mike Rainey
JFP: Journal of Functional Programming, December 2016
Umut A. Acar, Arthur Charguéraud, Mike Rainey, and Filip Sieczkowski
ICFP: International Conference on Functional Programming, September 2016
Umut A. Acar, Arthur Charguéraud, Mike Rainey
SC: ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis, November 2015
Umut A. Acar, Arthur Charguéraud, and Mike Rainey
ESA: European Symposium on Algorithms, September 2014
Umut A. Acar, Arthur Charguéraud, and Mike Rainey
PPoPP: Symposium on Principles and Practice of Parallel Programming, Febuary 2013
Umut A. Acar, Arthur Charguéraud, and Mike Rainey
DAMP: Workshop on Declarative Aspects and Applications of Multicore Programming, January 2012
Umut A. Acar, Arthur Charguéraud, and Mike Rainey
OOPSLA, October 2011