Date: Feb 19 2024
Slides: https://15721.courses.cs.cmu.edu/spring2024/slides/08-scheduling.pdf
Reading
Raw Lecture Notes
- (Andy starts with a brief mention of pipelining/vectorization vs. compilation - see Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask)
- Some definitions (won’t be very important though)
- Task - one or more operator instances
- Task Set → collection of all tasks for one query
- We will focus on single node systems (Andy says that distributed systems are basically just the “same” problem, albeit I think there’s many caveats to that)
- Every DBMS pretty much does its own scheduling, except for PGSQL??
- (This is inconsistent with what the papers for this lecture say)
- PG just launches 1 process per query (and it probably doesn’t even mess with OS priorities - very basic)
- I also recently watched this takj which I highly recommend: https://www.youtube.com/watch?v=xLLakMmVtbY (it’s about making Postgres multi-thread instead of multi-process)
- The DBMS should take over scheduling and not let the OS do it. The literature shows this and it just makes more sense.
- Goals:
- 1 maximize throughput
- 2 fairness
- 3 query responsiveness
- 4 low overhead
- Today…
- Worker Allocation
- Data Placement
- Scheduler Implementations
- Distributed Query Scheduling
- Every DBMS is multi-threaded pretty much except Postgres
- PG uses multi-process with shared memory. (At some point Andy worked on trying to make PG multi-threaded)
- Worker Allocation
- one worker per CPU core → better for most OLTP and OLAP systems
- this sounds “wasteful” but our cores should never be stalled if we do things right
- multiple workers per core
- Centralized vs Decentralized Dispatchers
- Decentralized is better (see literature and notes for this class)
- Operating on local data is very important
- NUMA regions
- S3 <> local server caches
- (again here we can see the mapping of algorithms and decisions that have to be made for single-node systems and multiple node systems)
- Partitioning vs Placement