Date: Feb 12 2024
Slides: https://15721.courses.cs.cmu.edu/spring2024/slides/06-vectorization.pdf
Reading
- Make the Most out of Your SIMD Investments: Counter Control Flow Divergence in Compiled Query Pipelines (H. Lang, et al., VLDB Journal 2020)
- Rethinking SIMD Vectorization for In-Memory Databases (O. Polychroniou, et al., SIGMOD 2015) (Optional)
- Filter Representation in Vectorized Query Execution, (A. Ngom, et al., DaMoN 2021) (Optional)
- Micro Adaptivity in Vectorwise (B. Raducanu, et al., SIGMOD 2013) (Optional)
- Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask (T. Kersten, et al., VLDB 2018) (Optional)
Raw Lecture Notes
- Vectorization (or data parallelization)
- how can we take an algorithm that operates on a series of tuples at a time to process one operation on multiple pairs of operands at the same time
- assume each core has 4-wide SIMD registers
- the main SIMD register we will study and care about is AVX-512 which is 512 bits large. It came out in 2017.
- very important that different processor versions differ a lot in terms of their SIMD support. low level engineers who write SIMD code have to write weird macros to check the architecture and the SIMD support for different things
- very rarely, compilers can automatically apply SIMD
- int *X, *Y, *Z
- for int i = 0; i < MAX; i++
- Even this super simple operation cannot be automatically vectorized because Z[i], X[i] or Y[i] might point to the same thing
- We can however, use compiler hints. At least DuckDB and Clickhouse do this.
- Of course, there is also Explicit Vectorization. This is where you tell the compiler to store certain things in SIMD registers.
- (TIL: Intel sells a ICC compiler that competes with clang/gcc which is paid but very good at generating simd code)
- SIMD masking
- modern simd CPUs allow you to only perform operations on a specific set of elements from a vector by using a bitmask
- this is like a declarative way of doing if/else inside a SIMD operation
- you can also re-order the elements of a vector using a “index vector”. again, declarative.
- it also allows selecting arbitrary elements from vectors
- Basically, with SIMD you can’t just do additions. You can do a whole bunch of operations on top of vectors.
- How does all this apply to databases?
- We want to favor vertical vectorization by processing different input data per lane.
- This allows us to maximize lane utilization.
- Note: taking DBMS operations like scans+filters and SIMD-ing them is really not trivial.
- For each batch, the SIMD vectors may contain tuples that are no longer valid (they were disqualified by some previous check)
- Query engines can identify basically “SIMD pipeline breakers”
- At these “breaks”, the engine can decide to use newly freed SIMD registers for incoming operators