I am following along the CMU Advanced Databases course for the Spring of 2024. I am watching every lecture, reading every (mandatory) paper, and writing paper reviews as well as lecture notes. Finally, I also would like to somehow work on the project assignment, but finding time for this will be very hard.
This course is primarily focused on building large scale OLAP systems, not so much on scaling OLTP systems. The majority of the content is around big data file formats, execution engines, query optimizers and schedulers.
This is my list of reading reviews, in the template required by this class. For each lecture, there is a mandatory paper that needs to be reviewed. There are then also some other papers that should be read, but a review is not required.
Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics (1)
The Composable Data Management System Manifesto
An Empirical Evaluation of Columnar Storage Formats
The FastLanes Compression Layout: Decoding > 100 Billion Integers per Second with Scalar Code
MonetDB/X100: Hyper-Pipelining Query Execution
Velox: Meta’s Unified Execution Engine
Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask
Efficiently Compiling Efficient Query Plans for Modern Hardware
Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age
Self-Tuning Query Scheduling for Analytical Workloads
An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory
Adopting Worst-Case Optimal Joins in Relational Database Systems
Froid: Optimization of Imperative Programs in a Relational Database
Don’t Hold My Data Hostage – A Case For Client Protocol Redesign
### Overview of the main idea (3 sentences)
...
### Key findings / takeaways from the paper (2-3 sentences)
...
### System used in evaluation and how it was modified/extended (1 sentence)
...
### Workload Evaluated (1 sentence)
...
Sub-Page Links
Lecture #00: Course Overview & Logistics
Lecture #01: Modern Analytical Database Systems
Lecture #02: Data Formats & Encoding I
Lecture #03: Data Formats & Encoding II
Lecture #04: Query Execution & Processing Part 1
Lecture #05: Query Execution & Processing Part 2
Lecture #06: Vectorized Query Execution
Lecture #07: Code Generation & Compilation
Lecture #08: Scheduling & Coordination
Lecture #09: Hash Join Algorithms
Lecture #10: Multi-Way Join Algorithms
Lecture #11: Server-side Logic Execution
Lecture #12: Networking Protocols
Lecture #13: Query Optimizer Implementation 1
Lecture #14: Query Optimizer Implementation 2
Lecture #15: Optimizer Implementation 3
Lecture #17: System Analysis (Google Dremel / BigQuery)
Lecture #18: System Analysis (Databricks / Spark)
Lecture #19: System Analysis (Snowflake)
Lecture #20: System Analysis (DuckDB)