Language abstractions and scheduling techniques for efficient execution of parallel algorithms on multicore hardware

Abstract

The Multicore OCaml team has made significant progress in the recent years. There now seems to be interest in working on the high-level parallelism constructs. Such constructs are also tightly connected to the problem of controlling the granularity of parallel tasks.

I've been working on parallel constructs and granularity control from 2011 to 2019, together with Umut Acar and Mike Rainey. We published a number of papers, each of them coming with theoretical bounds, an implementation, and evaluation on state-of-the-art benchmark of parallel algorithms.

While we mainly focused on C++ code, I speculate that nearly all of our ideas could be easily applied to Multicore OCaml. Porting these ideas would deliver what seems to be currently missing in Multicore OCaml for efficiently implementing a large class of parallel algorithms.

Gabriel Scherer and François Pottier recently suggested to me that it appears timely to share these results with the OCaml community. I'll thus try to give an easily-accessible, OCaml-oriented introduction to the results that we have produced. Note, however, that most of the ideas presented would apply essentially to another other programming language that aims to support nested parallelism.

I plan to cover the semantics of high-level parallelism constructs, to describe and argue for work-stealing scheduling, to present a number of tricks that are critical for efficiency, and to advertise for our modular, provably-efficient approach to granularity control. I'll post these parts one after the other, as I write them..

Contents

Other parts will be published in the coming weeks or months.