Date: Mar 11 2024
Slides: https://15721.courses.cs.cmu.edu/spring2024/slides/10-multiwayjoins.pdf
Reading
- Froid: Optimization of Imperative Programs in a Relational Database (K. Ramachandra, et al., VLDB 2017)
- Dear User-Defined Functions, Inlining isn’t working out so great for us. Let’s try batching to make our relationship work. Sincerely, SQL (K. Franz, et al., CIDR 2024) (Optional)
- Aggify: Lifting the Curse of Cursor Loops using Custom Aggregates (S. Gupta, et al., SIGMOD 2020) (Optional)
- Compiling PL/SQL Away (C. Duta, et al., CIDR 2020) (Optional)
Procedural Extensions of SQL: Understanding Their Usage in the Wild (S. Gupta, et al., VLDB 2021) (Optional)
- Functional-Style SQL UDFs With a Capital ‘F’ (C. Duta, et al., SIGMOD 2020) (Optional)
Raw Lecture Notes
- Advantages of having procedural logic in the database
- Fewer network round-trips (better efficiency)
- Immediate notification of changes
- DBMS spends less time waiting during transactions
- Developers do not have to reimplement functionality
- Extend the functionality of the DBMS
- Various options
- UDFs
- for most cases we can assume these can’t have side effects
- SPs (can be invoked outside of a SQL query)
- Triggers
- UDTs
- UDAs
- Some databases support UDFs written in other languages
- e.g., SQL/PSM is a standard
- but most databases have some dialect of PLSQL
- there’s one called Transact SQL
- other databases support Perl, Python, etc.
- UDF disadvantages
- difficult to parallelize and make performant
- too hard for the DBMS to analyze what the PLSQL is doing and make it performant
- hard for optimizers to make a lot of their usual optimizations
- but Froid (SQLServer, paper for this week’s class) can actually analyze the UDF and do some optimizations
- UDF Acceleration
- Compilation
- Parallelization (by relying on annotations in the code provided by the developers who wrote the UDF)
- Inlining (convert UDF into declarative SQL)
- Batching (take UDF and batch them over multiple tuples at a time)
- Froid is just called “UDF Inlining” or something in the SQLServer docs
- Froid
- Automatically convert UDFs into relational algebra expressions that are inlined as sub-queries
- This gets done during the rewrite phase? (not sure what this means)
- Sub-Queries to the rescue!!
- (We’ve already studied them in one of the first classes)
- Froid Overview
- step 1: transform statements
- step 2: break UDF into regions
- step 3: merge expressions
- step 4: inline UDF expression into SQL query
- step 5: run updated query through optimizer
- To intertwine the different sub-queries into temp tables, Froid basically uses lateral joins/cross apply in between sub-queries.
- A lot of UDF logic can be rewritten into a SQL statement like this 😲
- (SQLServer has a really mature query optimizer, so when running their optimizer of top of the generated SQL queries from the UDF, the database is able to squeeze even more performance)