Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(planner_v2): cost function and multi-stage optimization #723

Merged
merged 17 commits into from
Nov 3, 2022

Conversation

wangrunji0408
Copy link
Member

@wangrunji0408 wangrunji0408 commented Nov 2, 2022

This PR adds a practical cost function so that the optimizer can find the best plan.

But since egg doesn't support cost-based heuristic search, it is prone to state explosion and hard to find the best plan as we want in a reasonable time. To solve this problem, we divide the optimization into multiple stages. Each stage will only apply a part of the rules in limited iterations, then extract the best plan as the input of the next stage. In detail, we have 3 stages for now:

  1. aggregation extraction and column pruning.
  2. predicate and projection pushdown. (3 iterations)
  3. join reordering and turning to hash join.

After that, the planner is able to find the best plan for the existing TPC-H queries.

Q1:

Order: [$7.8 asc, $7.9 asc] (cost=15414)
  Projection: [$7.8, $7.9, sum($7.4), sum($7.5), sum(($7.5 * (1 - $7.6))), sum(((1 - $7.6) * ($7.5 + ($7.7 * $7.5)))), avg($7.4), avg($7.5), avg($7.6), rowcount] (cost=15403)
    Aggregate: [rowcount, avg($7.6), avg($7.5), avg($7.4), sum(((1 - $7.6) * ($7.5 + ($7.7 * $7.5)))), sum(($7.5 * (1 - $7.6))), sum($7.5), sum($7.4)], groupby=[$7.8, $7.9] (cost=15390)
      Filter: (1998-09-21 >= $7.10) (cost=12900)
        Scan: [$7.4, $7.5, $7.6, $7.7, $7.8, $7.9, $7.10] (cost=7000)

Q3:

TopN: limit=10, offset=null, orderby=[sum(($7.5 * (1 - $7.6))) desc, $6.4 asc] (cost=54478.105)
  Projection: [$7.0, sum(($7.5 * (1 - $7.6))), $6.4, $6.7] (cost=54473.105)
    Aggregate: [sum(($7.5 * (1 - $7.6)))], groupby=[$7.0, $6.4, $6.7] (cost=54468.105)
      HashJoin: cross, lkey=[$6.0], rkey=[$7.0] (cost=53584.105)
        HashJoin: cross, lkey=[$5.0], rkey=[$6.1] (cost=22651.05)
          Filter: ($5.6 = 'BUILDING') (cost=2700)
            Scan: [$5.0, $5.6] (cost=2000)
          Filter: (1995-03-15 > $6.4) (cost=7500)
            Scan: [$6.0, $6.1, $6.4, $6.7] (cost=4000)
        Filter: ($7.10 > 1995-03-15) (cost=7500)
          Scan: [$7.0, $7.5, $7.6, $7.10] (cost=4000)

Q5:

Order: [sum(($7.5 * (1 - $7.6))) desc] (cost=150338.08)
  Projection: [$0.1, sum(($7.5 * (1 - $7.6)))] (cost=150335.08)
    Aggregate: [sum(($7.5 * (1 - $7.6)))], groupby=[$0.1] (cost=150332.28)
      HashJoin: cross, lkey=[$7.2, $5.3], rkey=[$3.0, $3.3] (cost=149430.28)
        HashJoin: cross, lkey=[$6.0], rkey=[$7.0] (cost=61900.703)
          HashJoin: cross, lkey=[$5.0], rkey=[$6.1] (cost=28966.25)
            Scan: [$5.0, $5.3] (cost=2000)
            Filter: (($6.4 >= 1994-01-01) and (1995-01-01 > $6.4)) (cost=5620)
              Scan: [$6.0, $6.1, $6.4] (cost=3000)
          Scan: [$7.0, $7.2, $7.5, $7.6] (cost=4000)
        HashJoin: cross, lkey=[$3.3], rkey=[$0.0] (cost=51595.125)
          Scan: [$3.0, $3.3] (cost=2000)
          HashJoin: cross, lkey=[$0.2], rkey=[$1.0] (cost=22660.672)
            Scan: [$0.0, $0.1, $0.2] (cost=3000)
            Filter: ($1.1 = 'AFRICA') (cost=2700)
              Scan: [$1.0, $1.1] (cost=2000)

Q6:

Projection: [sum(($7.6 * $7.5))] (cost=7409.828)
  Aggregate: [sum(($7.6 * $7.5))], groupby=[] (cost=7408.328)
    Filter: ((24 > $7.4) and (($7.6 >= 0.07) and ((1995-01-01 > $7.10) and ((0.09 >= $7.6) and ($7.10 >= 1994-01-01))))) (cost=7210.72)
      Scan: [$7.4, $7.5, $7.6, $7.10] (cost=4000)

Q10:

TopN: limit=20, offset=null, orderby=[sum(($7.5 * (1 - $7.6))) desc] (cost=99839.414)
  Projection: [$5.0, $5.1, sum(($7.5 * (1 - $7.6))), $5.5, $0.1, $5.2, $5.4, $5.7] (cost=99830.414)
    Aggregate: [sum(($7.5 * (1 - $7.6)))], groupby=[$5.0, $5.1, $5.5, $5.4, $0.1, $5.2, $5.7] (cost=99821.016)
      HashJoin: cross, lkey=[$5.3], rkey=[$0.0] (cost=98313.016)
        HashJoin: cross, lkey=[$5.0], rkey=[$6.1] (cost=60378.563)
          Scan: [$5.0, $5.1, $5.2, $5.3, $5.4, $5.5, $5.7] (cost=7000)
          HashJoin: cross, lkey=[$6.0], rkey=[$7.0] (cost=23032.313)
            Filter: (($6.4 >= 1993-10-01) and (1994-01-01 > $6.4)) (cost=5620)
              Scan: [$6.0, $6.1, $6.4] (cost=3000)
            Filter: ($7.8 = 'R') (cost=5100)
              Scan: [$7.0, $7.5, $7.6, $7.8] (cost=4000)
        Scan: [$0.0, $0.1] (cost=2000)

Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
@wangrunji0408 wangrunji0408 changed the title feat(planner_v2): add cost function feat(planner_v2): cost function and multi-stage optimization Nov 3, 2022
@wangrunji0408 wangrunji0408 marked this pull request as ready for review November 3, 2022 07:03
@wangrunji0408 wangrunji0408 merged commit c1834f5 into main Nov 3, 2022
@wangrunji0408 wangrunji0408 deleted the wrj/cost branch November 3, 2022 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants