PASL: Parallel Algorithm Scheduling Library

I have worked on scheduling techniques for multicore programming, together with Umut Acar and Mike Rainey, as well as Filip Sieczkowski, Adrien Guatto, and Vitaly Aksenov. Our main results can be summarized as follows.

Work stealing with private deques. Work stealing is a standard technique for executing programs featuring the language constructs parallel-for, fork-join, and async-finish. Traditionally, it is implemented using a fancy concurrent data structure called Chase-Lev deques. We show that work stealing can be implemented using a very simple message-passing protocol, through shared memory, and nevertheless deliver similar performance. We present both a theoretical analysis and results on standard benchmarks.
Parallel graph traversal. We show that the flexibility offered by private deques enable more advanced forms of work sharing, such as split-half policies. In particular, split half is critically useful to implement a parallel graph traversal. Concretely, each core maintains a set of edges to visit. At any time, a core may split its set of edges and share one half with another core. This splitting can be implemented in logarithmic time thanks to an advanced chunked sequence data structure that we have designed.
Granularity control using asymptotic complexity functions. A major challenge with work-stealing style computation is to control the size of the subtasks: if tasks are too smalll, then the overheads of task-creation are prohibitive; but if tasks are too large, then parallelism fades. We show that by requiring the programmer to provide a few annotations describing asymptotic costs, and by instrumenting a runtime system to measure execution times and deduce constant factors, we can generate subtasks that approximately of the ideal size. Again, we have proved our approach to work both in theory and in practice.
Heartbeat scheduling. We present a technique for controlling granularity in situations where the asymptotic execution time of functions cannot be guessed from their arguments. Our approach is based on an instrumentation of the call stack. Periodically, branches of potential parallel calls at the top of the stack are promoted as tasks. This strategy delivers provably good results.
DAG-calculus. We show that all the classical language constructs for parallelism (including parallel-for, fork-join, async-finish, strict and lazy futures, etc.) can be encoded in terms of a handful of low-level primitives. Exploiting these encodings allows for simpler implementations of practical runtime systems for parallel computations.

Download

For links to the source code, please visit the DeepSea website.
Please don't hesitate to contact Mike Rainey and myself if you are interested in using this code.

Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey

PPoPP: Symposium on Principles and Practice of Parallel Programming, Febuary 2019

paper - abstract - SIGPLAN Research Highlights

Heartbeat Scheduling: Provable Efficiency for Nested Parallelism

Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski

PLDI: Programming Language Design and Implementation, June 2018

paper - abstract - artifact

Oracle-Guided Scheduling for Controlling Granularity in Implicitly Parallel Languages

Umut A. Acar, Arthur Charguéraud, and Mike Rainey

JFP: Journal of Functional Programming, December 2016

paper - abstract

Dag-Calculus: A Calculus for Parallel Computation