Details
Description
(this all started from https://bugs.launchpad.net/maria/+bug/929732)
Currently, nested semi-joins are merged together into one semi-join. This makes semi-join optimization/execution simpler.
This approach has downsides, though. If there is a semi-join subquery with multiple child semi-join subqueries, we will merge them all together into a semi-join that includes a cross-product join of the child subqueries. Running a cross-product join and then removing duplicates can be slow (see the launchpad bug for detailed description).
Nested semi-join support
========================
The idea is to support nested semi-joins. It will be possible for one semi-join
nest to contain another.
Semi-join nests will not be mixed with outer join nests.
== Optimization
=== Table pullout
current pull_out_semijoin_tables() will not work for nested semi-joins. With
nested semi-joins, we should:
- process inner semi-joins first, then their parents,
- pull out of inner semi-join to the parent semi-join (and then pull out
further to the top-level), - if we were doing pullout for a top-level semi-join S, and ended up in
a situation where S has only semi-joins as its children, then S should be
removed and its children made top-level.
(TODO: what about PS and re-executions? we need to be able to keep the
nested-semi-join structure permanent, and results of pullout must be kept
transient...)
=== SJ-Materialization
optimize_semijoin_nests() performs pre-optimization of potential
SJ-Materialization nests (so that we know what cost to put there when we
consider materialization)
It is fairly trivial that we should run optimize_semijoin_nests() for
innermost semi-join nests that are eligible for materialization.
It is not obvious what to do for a semi-join nest that is
- eligible for materialization,
- has child semi-join nests.
Should one run optimization separately, and chose the best semi-join
strategy for children (which may or may not be materialization?)
== Semi-join optimization as part of join optimization
Re-worked advance_sj_state() makes it not as hopeless as before.
TODO: Can we just postpone semi-join flattening until table pullout is done?
this will resolve the bug...
TODO: does current query plan representation allow for nesting?
=== FirstMatch
=== SJ-Materialization
=== Loose Scan
=== Duplicate Weedout
== Execution
=== Duplicate Weedout
- execution code supports nested.
=== FirstMatch
- ?
=== SJ-Materialization
- nested execution seems to be easy and automatic? We allow use of any
sj-strategies inside the SJ-Materialization join_tabs...
=== Loose Scan
- ?
Gliffy Diagrams
Attachments
Issue Links
- includes
-
MDEV-582 LP:929732 - semijoin much slower than in_to_exists
-
- Stalled
-
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions