Overview of the main idea (3 sentences)
The authors propose a user-space algorithm for parallelizing DBMS query execution. The “tasks” that are to be parallelized are a partition of individual “pipelines” that make up queries (see previous notes on what constitutes a pipeline). So, for each pipeline, they break it up into partitions (could be round-robin, “keyless sharding”, or based on some data attribute which is even better because of joins), and then each of those “pipeline-partition jobs” gets treated as an individual, parallelizable task. Furthermore, the authors also propose scheduling these across NUMA-aware worker threads. This way, each worker thread only should have to access very local data. The authors start 1 thread for each physical core on the machine. Sometimes, however, if a thread is free it might do some work that is not NUMA-local (this is not ideal but it’s better than sitting idle).
Key findings / takeaways from the paper (2-3 sentences)
- My personal main takeaway from this paper is that “partitioning”, which I’ve always correlated with distributed databases, is actually a completely valuable concept even in single-node systems. It’s as Andy mentioned in class… distributed databases are basically just the same thing as single-node systems because multi-core is equivalent to multi-server (with many caveats). But understanding single-node systems and optimizing them to their upmost is really, really important to build a great distributed database.
- Another takeaway from me is that “in order to be able to parallelize each pipeline, each operator must be capable to accept tuples in parallel and for operators that start a new pipeline, to produce tuples in parallel”. This is obvious in hindsight. The paper describes some algorithms to do this kind of thing well. I was surprised by the implementation of
ORDER BY
which is a parallel local-sort + merge, but the merge is also parallel! (Very cool)
- Finally, the authors argue for thread-local scheduling which means no centralized dispatcher. They achieve this efficiently using fancy lock-free data structures.
System used in evaluation and how it was modified/extended (1 sentence)
The authors used Hyper and they tested the following different scheduling mechanisms:
- Adaptive NUMA-aware scheduling
- Adaptive non-NUMA-aware scheduling that is still adaptive
- Non-adaptive scheduling
Workload Evaluated (1 sentence)
TPC-H and SSB.