Date: Jan 22 2024
Slides: https://15721.courses.cs.cmu.edu/spring2024/slides/01-modernolap.pdf
Reading
- ‣ (M. Armbrust, et al., CIDR 2021)
- ‣ (P. Pedreira, et al., VLDB 2023)
Raw Lecture Notes
- Data Cubes
- managed by a DBA typically refreshed manually with cronjobs
- sort of like materialized views from the old days, but materialized views should auto update
- Lakehouses
- interesting tidbit: “most data in the world is unstructured or semi-structured”
- OLAP Systems can somewhat be broken up as per the paper “The composable data management system manifesto”, but it’s really hard to do that well.
- e.g., what if one part of the system is using a different type of integers than another?
- This slide is a really good description of any OLAP system
- Fun fact: Snowflake’s catalog is built with foundationDB.
- Query Plans can either be DAGs or trees
- most commonly a tree
- but for complex queries, DAG might make more sense
- A DAG structure is used when the query optimizer recognizes that certain sub-operations or intermediate results can be reused in multiple parts of the query.
- for singlestore, because of CTEs, it already is a DAG
- Push Query to Data vs Push Data to Query
- This section was a bit confusing to me.
- SingleStore is hybrid in this regard. But it tries to mostly push query to data.
- Shared Nothing vs Shared Disk
- This section was super interesting. There’s pros/cons to both architectures. SingleStore is hybrid in this regard, but it definitely suffers from the cons of shared-nothing systems mentioned by Andy (harder to add/remove nodes quickly).