Overview of the main idea (3 sentences)
The main idea of this paper is the hash trie join, which is an alternative to other worst-case optimal joins such as the Leapfrog Triejoin. These joins should be used for multi-way joins where the intermediate joins could generate “exploding” inputs to other joins, i.e., a lot of tuples/rows that will be discarded anyway.
Key findings / takeaways from the paper (2-3 sentences)
- Some joins, especially multi-table joins, should not be resolved by a combination of binary joins but rather a more sophisticated worst-case optimal join algorithm.
- It is possible, using query optimizer statistics, to have a heuristic for deciding when to use binary joins vs. worst-case optimal join: "A binary join is replaced either if it is classified as a growing join, i.e. its output cardinality is greater than the maximum of its input cardinalities, or if one of its inputs has already been replaced by a multi-way join (line 6). In both cases, a single multi-way join is built from the inputs and the current join condition.
System used in evaluation and how it was modified/extended (1 sentence)
Umbra, MonetDB, “DBMS X” which is probably Oracle, EmptyHeaded.
In Umbra, the authors implemented both leap frog join as well as two variations of the hash trie join introduced in this paper.
Workload Evaluated (1 sentence)
- Join Order Benchmark (JOB), which is based on IMDB data
- TPC-H
- Some Wikipedia and Slashdot graph data