Lecture #06: Vectorized Query Execution

Date: Feb 12 2024

Slides: https://15721.courses.cs.cmu.edu/spring2024/slides/06-vectorization.pdf

Reading

Make the Most out of Your SIMD Investments: Counter Control Flow Divergence in Compiled Query Pipelines (H. Lang, et al., VLDB Journal 2020)
Rethinking SIMD Vectorization for In-Memory Databases (O. Polychroniou, et al., SIGMOD 2015) (Optional)
Filter Representation in Vectorized Query Execution, (A. Ngom, et al., DaMoN 2021) (Optional)
Micro Adaptivity in Vectorwise (B. Raducanu, et al., SIGMOD 2013) (Optional)
Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask (T. Kersten, et al., VLDB 2018) (Optional)

Raw Lecture Notes

Vectorization (or data parallelization)
- how can we take an algorithm that operates on a series of tuples at a time to process one operation on multiple pairs of operands at the same time
assume each core has 4-wide SIMD registers
the main SIMD register we will study and care about is AVX-512 which is 512 bits large. It came out in 2017.
very important that different processor versions differ a lot in terms of their SIMD support. low level engineers who write SIMD code have to write weird macros to check the architecture and the SIMD support for different things
very rarely, compilers can automatically apply SIMD
- int *X, *Y, *Z
- for int i = 0; i < MAX; i++
  - Z[i] = X[i] + Y[i]
- Even this super simple operation cannot be automatically vectorized because Z[i], X[i] or Y[i] might point to the same thing
- We can however, use compiler hints. At least DuckDB and Clickhouse do this.
Of course, there is also Explicit Vectorization. This is where you tell the compiler to store certain things in SIMD registers.
(TIL: Intel sells a ICC compiler that competes with clang/gcc which is paid but very good at generating simd code)
SIMD masking
- modern simd CPUs allow you to only perform operations on a specific set of elements from a vector by using a bitmask
- this is like a declarative way of doing if/else inside a SIMD operation
- you can also re-order the elements of a vector using a “index vector”. again, declarative.
- it also allows selecting arbitrary elements from vectors
- Basically, with SIMD you can’t just do additions. You can do a whole bunch of operations on top of vectors.
How does all this apply to databases?
- We want to favor vertical vectorization by processing different input data per lane.
- This allows us to maximize lane utilization.
Note: taking DBMS operations like scans+filters and SIMD-ing them is really not trivial.
For each batch, the SIMD vectors may contain tuples that are no longer valid (they were disqualified by some previous check)
- Query engines can identify basically “SIMD pipeline breakers”
- At these “breaks”, the engine can decide to use newly freed SIMD registers for incoming operators