Velox: Meta’s Unified Execution Engine

Personal note: despite Andy praising Velox a lot in his classes, I wasn’t really “sold” on the value it brings. After reading this paper, I am in awe. The ideas behind Velox are really interesting, and it looks like they’ve built a lot of awesome things inside of it. But understand what Velox is and what value it provides to the ecosystem is not possible without reading this paper. (The GitHub repo and the announcement blog posts from Meta just aren’t clear enough)

Overview of the main idea (3 sentences)

The main behind Velox is taking all the common code from a lot of database engines and data processing systems and putting it in a C++ library that can be used by many different clients. It has very generic types for things like vectors, query plan operators and other things. This has allowed Meta to build systems like Prestissimo and Spruce which are basically re-implementations of Presto and Spark, respectively, but written in C++, without too much work (they only rebuilt the core data engine, not the “frontend” or the I/O). In theory, anybody in the community can now reap the benefits of Velox to build their own OLAP database engine or data processing tools for AI/ML.

Velox doesn’t generate query plans
Velox doesn’t parse SQL or any other language
Velox simply has functions that clients can use to perform common query plan operators and other data processing functions

Key findings / takeaways from the paper (2-3 sentences)

Meta has been able to accelerate both Presto and Spark while reusing a lot of the code for both using Velox. This is brilliant — and if they convince more people from the community (so far only Bytedance) to contribute back to Velox, they get to reap those benefits too.
Velox could be used to build future OLAP database engines or data processing tools for AI/ML. This will save future projects a lot of work and time invested. The obvious downside being that they’ll be a bit coupled to Velox, but its types and API seem to be generic enough.

System used in evaluation and how it was modified/extended (1 sentence)

The authors compare regular Presto vs Prestissimo (Presto with a modified C++ engine built with Velox). Prestissimo does a lot better, in general.

Workload Evaluated (1 sentence)

3TB TPCH in ORC format (they enter a manual query plan since Velox doesn’t generate query plans)
Unknown analytical workload from Meta