Self-Tuning Query Scheduling for Analytical Workloads

Overview of the main idea (3 sentences)

7 years after Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age, the Hyper/Umbra folks are back to improve upon their morsel-based scheduling algorithms. In this paper, a few ideas are discussed:

[Not novel, from 1995] Stride scheduling is briefly re-introduced, which is a mechanism for prioritizing concurrent processes based on “time slices”.
Thread-local scheduling (also briefly mentioned in the Morsel paper, but more heavily argued for in this paper), which is basically the idea of not having a centralized dispatcher and simply “sharding” the scheduling logic with lock-free data structures.

Now, for the main ideas:

Dynamically-sized morsels (partitions), which were fixed in the paper from 2014. This is basically protecting against “unfair” morsels being slower to process for some reason.
Adaptive query priorities, which is basically slightly prioritizing faster queries to reduce overall query latency.
1. "short running queries can be prioritized without noticeably altering the latency of other, long running requests. Assume we are executing two types of queries. The short running ones make up 90% of the workload and take 10ms to execute. Meanwhile, the long running requests take 1s. Even if we treat all short requests preferentially, this amounts to less than 10% of the overall workload being prioritized. As a result, the long running requests are not slowed down significantly”
2. This is done by slightly reducing the priority of queries which have used more CPU time so far.
Self-tuning the priorities from #4 over time by playing around with different values and measuring overall query latency (🤯)
1. (This is the kind of idea that is always fun to talk about but very hard to actually engineer)

Key findings / takeaways from the paper (2-3 sentences)

Thread-local scheduling without a centralized dispatcher just makes sense (at least for intra-server scheduling - this might be different in distributed databases?).
Not slightly prioritizing faster queries is bad for end-users. Also, this can be made to work with user-defined priorities as well (although this paper doesn’t go into that).
Self-tuning somewhat magic knobs in systems can be made to work successfully!

System used in evaluation and how it was modified/extended (1 sentence)

The authors start by testing the different scheduling algorithms within Umbra. With self-tuning + fair scheduling, it performs the best.
Then the authors test mean relative slowdown between Umbra, MonetDB and PostgreSQL (both of which let the OS do scheduling). Postgres start to become slow very early on and even with extreme load, Umbra always does more queries per second and has less relative slowdown.

Workload Evaluated (1 sentence)

“sample from TPC-H queries at SF3 and SF30 (…) picking queries at SF3 is three times more likely than picking queries at SF30 (…) in order to obtain a more interesting workload”