Language abstractions and scheduling techniques for efficient execution of parallel algorithms on multicore hardware
The Multicore OCaml team has made significant progress in the recent years.
There now seems to be interest in working on the high-level parallelism
constructs. Such constructs are also tightly connected to the problem of
controlling the granularity of parallel tasks.
I've been working on parallel constructs and granularity control from 2011
to 2019, together with Umut Acar and Mike Rainey. We published a number of
papers, each of them coming with theoretical bounds, an implementation,
and evaluation on state-of-the-art benchmark of parallel algorithms.
While we mainly focused on C++ code, I speculate that nearly all of our
ideas could be easily applied to Multicore OCaml. Porting these ideas
would deliver what seems to be currently missing in Multicore OCaml for
efficiently implementing a large class of parallel algorithms.
Gabriel Scherer and François Pottier recently suggested to me that it
appears timely to share these results with the OCaml community. I'll
thus try to give an easily-accessible, OCaml-oriented introduction to
the results that we have produced. Note, however, that most of the
ideas presented would apply essentially to another other programming
language that aims to support nested parallelism.
I plan to cover the semantics of high-level parallelism constructs,
to describe and argue for work-stealing scheduling, to present a number
of tricks that are critical for efficiency, and to advertise for our
modular, provably-efficient approach to granularity control. I'll post
these parts one after the other, as I write them..
Other parts will be published in the coming weeks or months.