Date: Feb 05 2024
Slides: https://15721.courses.cs.cmu.edu/spring2024/slides/04-execution1.pdf
Reading
Raw Lecture Note
- reminder reminder reminder: OLAP systems mostly do sequential scans
- the rough “top 3 optimizations” for these systems are:
- vectorization / parallelization (SIMD)
- task parallelization
- code specialization (JIT/pre-compiling)
- In C/C++ -O3 is generally considered unsafe/too risky
- We need to:
- 1 Reduce instruction count
- 2 Reduce cycles per instruction (less cache misses, more CPU register caching)
- 3 Parallelize Execution (multi-threading)
- MonetDB/X100 from 2005 is a seminal paper that introduces a more efficient way to build OLAP query engines that makes better use of modern CPU architectures.
- (this paper won a “Test of Time award”, and a lot of the ideas in it inspired Snowflake and Flink and others)
- Alright, back to undergrad - CPU Overview:
- pipelines stages
- problems
- dependencies
- branch prediction (CPUs try to predict if/else branches. When doing a sequential scan, which databases do a lot, the filter for the scan is very important)
- Apparently in C++ you can use a
likely
/ unlikely
niche syntax to hint the compiler about branch prediction. Using this is very controversial, some people argue against it as it is bad for portability and can cause more harm than good.
- Branchless code is also possible. The end result is very bizarre but it can be faster and more predictable.
- Processing Model for query execution
- Iterator Model / Volcano Model
- MySQL / pg / sql server / db2
- kind of the “obvious” model for executing a plan
- Basically it’s the tuple by tuple model. Allows for pipelining where the DBMS processes each tuple through as many operators as possible before retrieving the next tuple. The pipeline only supports at an operator which is a “pipeline breaker” - order by, join builds, subqueries, etc..
- Materialization Model
- The operator by operator model. Every operator is a pipeline breaker.
- Vectorized Model (best for OLAP)
- The a few tuples at a time model. This is what singlestore, snowflake, alloydb, databricks, duckdb all do. AKA “Array Model”, great for modern CPUs.
- Plan Processing ****Direction
- There’s top-to-bottom (pull)… Next() Next()
- There’s bottom-to-top (push model)
- snowflake
- hyper (VLDB 2011) - paper where this model was basically invented. Very good paper I should read.
- We will get back to these models.
- In the vectorized model, some tuples will be passed around that are useless (will be filtered out, or should have already been filtered out.)
- We can solve this with Selection Vectors or Bitmaps. Either of these solutions imply passing around extra information about the arrays of tuples that are being passed around. This sounds wasteful but it’s better than the Volcano model because we’re taking a lot more advantage of CPU pipelining. Especially with SIMD in the mix.