Lecture #01: Modern Analytical Database Systems

Date: Jan 22 2024

Reading

Data Cubes
- managed by a DBA typically refreshed manually with cronjobs
- sort of like materialized views from the old days, but materialized views should auto update
Lakehouses
- interesting tidbit: “most data in the world is unstructured or semi-structured”
OLAP Systems can somewhat be broken up as per the paper “The composable data management system manifesto”, but it’s really hard to do that well.
- e.g., what if one part of the system is using a different type of integers than another?
This slide is a really good description of any OLAP system
- Fun fact: Snowflake’s catalog is built with foundationDB.

Screenshot from 2024-01-28 17-48-49.png

Query Plans can either be DAGs or trees
- most commonly a tree
- but for complex queries, DAG might make more sense
- A DAG structure is used when the query optimizer recognizes that certain sub-operations or intermediate results can be reused in multiple parts of the query.
- for singlestore, because of CTEs, it already is a DAG
Push Query to Data vs Push Data to Query
- This section was a bit confusing to me.
- SingleStore is hybrid in this regard. But it tries to mostly push query to data.
Shared Nothing vs Shared Disk
- This section was super interesting. There’s pros/cons to both architectures. SingleStore is hybrid in this regard, but it definitely suffers from the cons of shared-nothing systems mentioned by Andy (harder to add/remove nodes quickly).