Personal note: despite Andy praising Velox a lot in his classes, I wasn’t really “sold” on the value it brings. After reading this paper, I am in awe. The ideas behind Velox are really interesting, and it looks like they’ve built a lot of awesome things inside of it. But understand what Velox is and what value it provides to the ecosystem is not possible without reading this paper. (The GitHub repo and the announcement blog posts from Meta just aren’t clear enough)

Overview of the main idea (3 sentences)

The main behind Velox is taking all the common code from a lot of database engines and data processing systems and putting it in a C++ library that can be used by many different clients. It has very generic types for things like vectors, query plan operators and other things. This has allowed Meta to build systems like Prestissimo and Spruce which are basically re-implementations of Presto and Spark, respectively, but written in C++, without too much work (they only rebuilt the core data engine, not the “frontend” or the I/O). In theory, anybody in the community can now reap the benefits of Velox to build their own OLAP database engine or data processing tools for AI/ML.

Key findings / takeaways from the paper (2-3 sentences)

System used in evaluation and how it was modified/extended (1 sentence)

The authors compare regular Presto vs Prestissimo (Presto with a modified C++ engine built with Velox). Prestissimo does a lot better, in general.

Workload Evaluated (1 sentence)