Froid: Optimization of Imperative Programs in a Relational Database

Overview of the main idea (3 sentences)

The main idea of this paper is that UDFs can be converted into a series of declarative SQL sub-queries that are APPLY’d into each other. This allows existing database query optimizers to treat logic contained in UDFs as part of the regular query. Previously, query optimizers mainly ignore UDFs in the sense that they don’t cost them and just interpret them tuple by tuple. This is very slow, but with Froid, UDFs can run as fast as regular SQL.

The authors also point out that many traditional imperative code optimizations that compilers do (dead code elimination, constant propagation, etc.) aren’t even needed for UDFs with Froid because they won’t be a problem in the newly generated SQL.

Key findings / takeaways from the paper (2-3 sentences)

In my view, the main takeaway is that the performance difference of having Froid enabled is not just very impressive but also very consistent across almost all workloads evaluated.

But also, it was very interesting for me to understand from this paper why UDFs are so bad for performance. The authors do a really good job of explaining why query optimizers are basically forced to treat them as a black box, single-thread their execution and not cost them at all. That was very eye-opening to me since I had no idea about that.

Another takeaway is how having cloud databases allows engineers working on those systems to build and test things like Froid. The authors were able to test Froid on real customer workloads which would be very hard to do if SQL Server were a pure on-prem/self-managed solution.

(Of course, Froid only supports a very small subset of T-SQL as of the writing of this paper. For instance, loops are not supported)

System used in evaluation and how it was modified/extended (1 sentence)

The system used in evaluation is SQL Server. It was extended with Froid, the system described in the paper.

Workload Evaluated (1 sentence)

The authors evaluated real customer workloads from Azure SQL. This is fantastic.

Moreover, they also tested TPC-H queries implemented using UDFs (queries 5, 9, 11, 12, 14, 22).

The results are simply outstanding.