Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce next-gen planner & optimizer #702

Merged
merged 26 commits into from
Oct 6, 2022
Merged

feat: introduce next-gen planner & optimizer #702

merged 26 commits into from
Oct 6, 2022

Conversation

wangrunji0408
Copy link
Member

@wangrunji0408 wangrunji0408 commented Oct 5, 2022

This PR introduces a brand new optimizer based on egg, a powerful crate to build cascades-style optimizers in Rust.

With egg, we can define a complex rewrite rule using pattern matching in a few lines:

// predefine join node: (join type condition left right)
rewrite!("join-reorder";
    "(join ?type ?cond2 (join ?type ?cond1 ?left ?mid) ?right)" =>
    "(join ?type ?cond1 ?left (join ?type ?cond2 ?mid ?right))"
    if columns_is_disjoint("?cond2", "?left")
    // ^ a custom function to test whether the column set of two nodes are disjoint
),

As a result, the new optimizer is super small but complete.

In about 600 lines of code (+ 200 lines for test), it implements the following optimizations:

  • expression simplification
  • constant folding
  • predicate pushdown
  • nested-loop-join -> hash-join
  • join reordering
  • column pruning

and resolves two annoying problems in the old planner:

  • extracting aggregations from select list
    SELECT sum(a) + b FROM t GROUP BY b;
    => 
    SELECT s + b FROM          # projection
      (SELECT b, sum(a) AS s   # aggregation
       FROM t GROUP BY b);
  • converting column references to physical indices (InputRef)
    and maintaining them during optimization
    =>
    SELECT $1 + $0 FROM 
      (SELECT b, sum(a) FROM t GROUP BY b);

As a comparison, the current planner + optimizer has nearly 4100 lines of code (+ 600 lines for test) with a similar feature set.

In my opinion, the egg framework is definitely a game changer for building database optimizers.
If it proves to work well in RisingLight, we can consider migrating it to RisingWave in the future.

Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
- fix filter-true/false rule
- remove incorrect pushdown for proj-filter

Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
@skyzh
Copy link
Member

skyzh commented Oct 5, 2022

Looks great to me! Will take a look soon.

Copy link
Member

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, thanks for your great work! It looks like the planner has not been fully integrated with the current query processing code path. Is there a tracking issue / TODO list for this?

impl std::fmt::Display for ColumnRefId {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
// TODO: now ignore database and schema
write!(f, "${}.{}", self.table_id, self.column_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make ColumnRefId more generic (e.g. Vec<String>) so as to represent a column in subquery. e.g. subquery.table1.x. I'll do this probably when refactoring binder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave up taking subquery into account. It's a little complicated. 🤯

@wangrunji0408
Copy link
Member Author

It looks like the planner has not been fully integrated with the current query processing code path. Is there a tracking issue / TODO list for this?

Not yet. I'll create a tracking issue soon.

A rough plan is to:

  1. fork the current binder, convert the sqlparser AST into the new egg AST.
  2. rewrite the type checking as an egg analysis. (optional)
  3. construct executors from the egg AST.
  4. remove the old binder, planner and optimizer

@skyzh
Copy link
Member

skyzh commented Oct 5, 2022

Sounds great. I'll start working on the new binder soon. Constructing executors from egg AST also doesn't look hard.

Signed-off-by: Runji Wang <wangrunji0408@163.com>
@skyzh
Copy link
Member

skyzh commented Oct 6, 2022

ready to merge?

@wangrunji0408 wangrunji0408 enabled auto-merge (squash) October 6, 2022 04:51
@wangrunji0408 wangrunji0408 merged commit b4198e9 into main Oct 6, 2022
@wangrunji0408 wangrunji0408 deleted the wrj/egg branch October 6, 2022 04:58
@fuyufjh
Copy link

fuyufjh commented Oct 6, 2022

Really blow my mind!

@fuyufjh
Copy link

fuyufjh commented Oct 6, 2022

Well, to be honest, I still had some concerns about the capability of DSL for a productional database system.

https://twitter.com/fuyufjh/status/1509182041828790274?s=20&t=LBlWsQKaNTiU2QBXbDtGBg

@wangrunji0408
Copy link
Member Author

wangrunji0408 commented Oct 8, 2022

I think these kinds of DSL are suitable to describe the high-level structural trait of a rule. The rest detailed and complex logic is usually left to the native language. In fact, this combination is a common pattern in the Programming Language field (e.g. lex, yacc...). Because functional, declarative languages are good at language processing, while low-level procedural languages are more efficient. In this term, although Rust has pattern matching, we found it's not expressive enough for optimizer rules. So in my opinion, some kinds of metaprogramming are still needed, whatever it is macro, codegen, or DSL.

PS. When I was writing the rules of this PR, I felt like I was writing LISP, not Rust. Then I began to realize why LISP is so powerful. Greenspun's tenth rule 😇

@CAJan93
Copy link

CAJan93 commented Oct 12, 2022

Thank you very much for your work! Looks amazing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants