Efficiently Compiling Efficient Query Plans for Modern Hardware

Overview of the main idea (3 sentences)

The main idea discussed in this (seminal) paper from 2011 is using LLVM to generate code for SQL queries in DBMSs (as opposed to transpiling C++ which had been tried in the past by systems like HIQUE). The authors go into why LLVM makes a lot of sense for this, and how this compares to not compiling code and simply leveraging super-scalar CPU pipelining to run queries fast. (In Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask these are compared at greater detail, and the conclusion is that their performance is roughly similar)

Key findings / takeaways from the paper (2-3 sentences)

Using LLVM is much better than generating C++ (faster to compile, easier to generate faster and safer code, etc.)
- Sub-takeaway: LLVM is very well designed.
Compiling a bunch of operators from a query plan into a single “block of code” performs really well.
It should not be hard to generate SIMD instructions inside the generated code.

System used in evaluation and how it was modified/extended (1 sentence)

“DB X” (Oracle according to Andy)
Hyper with C++ compilation vs. Hyper with LLVM compilation
VectorWise and MonetDB

Workload Evaluated (1 sentence)

TPC-C and TPCH