Lecture #05: Query Execution & Processing Part 2

Date: Feb 07 2024

Slides: https://15721.courses.cs.cmu.edu/spring2024/slides/05-execution2.pdf

Reading

Velox: Meta’s Unified Execution Engine (P. Pedreira, et al., VLDB 2022)
Shared Foundations: Modernizing Meta’s Data Lakehouse (B. Chattopadhyay, et al., CIDR 2023) (Optional)

Raw Lecture Notes

OLAP systems today are roughly 50-50% push-based vs. pull-based
- Push-based is superior for OLAP. From previous system:
  - Start with leaf nodes and "push" data to their parents.
  - Can "fuse" operators together within a for-loop to minimize intermediate result staging.
- Pros/Cons:
  - Allows for tighter control of caches/registers in pipelines.
  - May not have exact control of intermediate result sizes.
  - Difficult to implement some operators (Sort-Merge Join).
- Another big advantage of push-based is that it allows for fine-grained control of execution. Can more easily be paused/resumed as needed.
Parallel Execution
- at least for this class, multi-process, multi-threaded and multi-node will be the same thing
- (I know that in practice this is not really true. It has never been more true as it is today though with the network performance evolving so much)
- Inter Query Parallelism AND Intra Query Parallelism
- Most DBMSs use a First Come First Serve queue to prioritize queries. But sometimes they have a prioritization mechanism.
- Intra Query Parallelism
  - Intra Operator (Horizontal) - most common
    - We will split a single operator into different pieces of data.
    - Results need to be put back together.
    - A Gather operator combines the result from multiple workers into a single output stream. (SingleStoreDB has this) - N to 1
    - A Distribute operator fans it out (basically the opposite) - 1 to N
    - A Repartition/Shuffle takes multiple inputs and splits it into multiple outputs (N to M)
  - Inter Operator (Vertical)
    - Run operators at the same time.
- Can we build parts of a DBMS query engine as a library?
  - Velox, (Apache Arrow) DataFusion, Intel OAP, Polars
- Meta Velox (paper!)
  - High-performance single-node query executor
    - No SQL parser
    - No metadata catalog
    - No cost-based optimizer
    - It takes a plan and executes it.
  - What does it have?
    - Push-based vectorized query processing
    - Precompiled primitives + Codegen expressions
    - Arrow Compatible
    - Adaptive Query Optimization
    - Only supports Sort-Merge + Hash Joins
  - It doesn’t own data. It exposes API to define connectors to retrieve data that have to be implemented the clients of this library. It also exposes adapters to encode/decode different storage format.
  - (DWRF is the next version of ORC?)
  - Velox Components
    - Type System
    - Expression Engine
    - Internal Data Representation
    - Function API
    - Operator Engine
    - Storage Connectors/Adapters
    - Resource Manager
  - The best OLAP systems only materialize tuples when necessary. Only IDs are sent up or up down the plan execution. To do joins, we mostly don’t need the full tuple.
    - What is the benefit of early materialization?
      - (I don’t know the answer)
  - Apache Arrow is an open source to describe how to store data in memory in a columnar fashion.
    - Arrow is better than Parquet/ORC at random access.
    - Arrow string encoding
      - used to be very naive
      - velox/duckdb improves it to make it better (some sort of dictionary encoding). German-style encoding.
  - Substrait is like Arrow but for query plans. The idea is that all databases could share the same “schema” for query plans.
    - Not that widely adopted yet. Still early.
  - Expression Evaluation
    - can we compile the expressions
    - can we vectorize it so a bunch of tuples are evaluated
    - With Velox, expression trees can be flattened. Besides this, it also enables a lot of micro-optimizations around expression execution (e.g., if they see UPPER('hello') they will just memoize it to HELLO).
  - Everything we’ve discussed so far is after the query plan has been generated. But query plan generation is super important - for good performance, we need good plans.
    - Can the execution engine be allowed to modify a query’s plan and the expression trees?
    - Some execution engines do this. They learn from the execution and rewrite parts of the plan for the current or future executions.
    - (See “Adaptive Query Processing In The Looking Class”, CIDR 2025)
  - With Velox, it adapts some expressions to accelerate them:
    - Predicate Reordering (WHERE SLOW_FUNC(a)=true AND FAST_FUNC(b)=true, maybe it makes sense to run the FAST_FUNC first to reduce the data size)
    - Column Prefetching
    - Not Null Fast Paths
    - Use faster ASCII functions if the strings don’t actually have any UTF-8 characters