Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask

Personal note: I always enjoy learning the historical context behind different concepts. And it also helps me memorize things:

Vectorization was proposed by Boncz et al. in 2005, and first implemented and showcased in MonetDB/X100: Hyper-Pipelining Query Execution. It waas later adopted by many systems.
In 2011, Neumann proposed data-centric code generation using the LLVM compiler framework.
- I believe SingleStoreDB was one of the first commercial systems to introduce this idea, way back in 2016.

I am also very glad I read this paper ahead of Lecture #07.

Overview of the main idea (3 sentences)

This paper compares the pull and push models of query plan processing. The authors refer to these as:

Vectorization (taking advantage of super scalar CPU pipelining)
Data-centric code generation (generating e.g. LLVM code for queries)

While not completely orthogonal, these concepts are not completely coupled either. I would argue that vectorization could also be a part of data-centric code generation. And generated code could and does certainly take advantage of pipelining. However, the authors point out that the pull model lends itself to vectorization really well (especially for OLAP), and that the push model really benefits from LLVM-style code generation. So, the comparison and the intertwining do make sense.

Key findings / takeaways from the paper (2-3 sentences)

The authors conclude that both strategies are viable, and the performance for them is similar in OLAP workloads. For OLTP systems, the authors conclude that code compilation is better (not just because of performance, but also because it allows systems to more easily adopt custom languages for user-defined functions).

Another important key mention is that SIMD is better applicable in vectorized systems, but should also be possible to integrate into push model systems with code generation (but probably very hard).

Personal note: I’m very interested in HTAP systems such as SingleStoreDB. For these products, a mixture is required.

System used in evaluation and how it was modified/extended (1 sentence)

https://github.com/TimoKersten/db-engine-paradigms

It’s two custom built systems the authors call “Tectorwise” and “Typer”.

Workload Evaluated (1 sentence)

TPCH Q1,Q6, Q3, Q9, Q18
Star Schema Benchmark (SSB) Q1, Q2, Q3, Q4