The header of group by is incorrect #538

xxchan · 2022-03-01T13:00:15Z

The header is incorrect

> SELECT c0+1, c0+1+count(*) FROM t1 GROUP BY c0+1; 
+----+----------+
| c0 | ?column? |
+----+----------+
| 4  | 5        |
| 2  | 3        |
+----+----------+
in 0.004s

Originally posted by @xxchan in #535 (comment)

skyzh · 2022-04-01T05:59:16Z

Now it will panic with:

SELECT c0+1, c0+1+count(*) FROM t1 GROUP BY c0+1;
thread 'tokio-runtime-worker' panicked at 'Column #ColumnRefId { database_id: 0, schema_id: 0, table_id: 11, column_id: 0 } should not be evaluated in `eval_array`', src/executor/evaluator.rs:55:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'failed to join query thread: JoinError::Panic(...)', src/main.rs:126:23

🤪

skyzh · 2022-04-01T06:16:27Z

Despite the above issue, let's fix an easier one.

Case 1

create table t1 (c1 int);
insert into t1 values (1), (2), (3);
select c1 + 1 from t1 group by c1 + 1;

which will produce:

+----+
| c1 |
+----+
| 3  |
| 4  |
| 2  |
+----+

c1 should be c1 + 1 instead.

Case 2

create table t2 (c1 int, c2 int);
insert into t2 values (0, 1), (1, 2);
select c1 + c2 from t2 group by c1 + c2;

which yields:

+----+
| c1 |
+----+
| 1  |
| 3  |
+----+

(should be c1 + c2 instead)

Basically, this is a problem with schema. The derived schema is incorrect for logical aggregate.

risinglight/src/optimizer/plan_nodes/logical_aggregate.rs

Line 78 in a23b9b1

.map(|expr| child_schema[expr.index].clone())

L78 directly used child schema, which is c1. But here we should have c1 + 1 / c1 + c2 as column desc instead. Also, in case of c1 + c2, the schema is totally incorrect. The derived schema for select c1 + c2 is:

[ColumnDesc { datatype: Int(None) (null), name: "c1", is_primary: false }, ColumnDesc { datatype: Int(None) (null), name: "c2", is_primary: false }]

while this executor will only output one column.

So we will need to rewrite the whole scheme derivation process. It should be as simple as:

        self.group_keys
            .iter().map(|expr| ColumnDesc { /* how to fill the inner struct? */ }).chain(/* agg call schema */)

Where we need to derive a schema for arbitrary expression. We can simply use format!("{:?}", expr) as column name, expr.return_type as data type, and set is_primary = false. We now have these three properties, and we can easily compose a ColumnDesc.

skyzh · 2022-04-01T06:16:48Z

cc @shmiwy would you please have a try?

shmiwy · 2022-04-13T03:24:52Z

It seems that we can't compose a ColumnDesc simply use format!("{:?}", expr) as column name, expr.return_type as data type, and set is_primary = false, because we can use multi filed when we group by.

create table t(c1 int, c2 int, c3 int, c4 int);
insert into t values (0,1,2,3), (1,2,3,4), (2,3,4,5);
select c1 + c2, c3 + c4 from t group by c1 + c2, c3 + c4;

the logical_plan created by this line

risinglight/src/db.rs

Line 190 in bcbe568

let logical_plan = input_ref_resolver.rewrite(logical_plan);

shows like this

project_expressions: [
        InputRef #0,
        InputRef #1,
    ],
    child: LogicalAggregate {
        agg_calls: [],
        group_keys: [
            Plus(InputRef #0, InputRef #1),
            Plus(InputRef #2, InputRef #3),
        ],
       ........................

when we get column_names, column_names[0] should be c1 + c2, column_names[1] == c3 + c4。
somaybe we should change this line

risinglight/src/optimizer/plan_nodes/logical_aggregate.rs

Line 78 in bcbe568

.map(|expr| child_schema[expr.index].clone())

to corresponding form(maybe we can get form from self.group_keys)

skyzh · 2022-04-13T03:35:17Z

to corresponding form(maybe we can get form from self.group_keys)

Yes, exactly! You can find out how to generate c1 + c2 from the expression input ref #0 + input ref #1.

skyzh · 2022-04-13T03:37:14Z

For the first step, we can keep it simple -- if the group_keys is not a simple InputRef, we simply use Plus(InputRef #0, InputRef #1) as the column name. In the future, we can add a function like format_with_column_desc(&self, column_desc: Vec<ColumnDesc>) on BoundExpr, so that we can resolve the full name.

* executor: use DataChunkbuilder to refactor ValuesExecutor (#623) * use DataChunkbuilder to refactor ValuesExecutor Signed-off-by: Jayice <1185430411@qq.com> * code format Signed-off-by: Jayice <1185430411@qq.com> * code style Signed-off-by: Jayice <1185430411@qq.com> Signed-off-by: Shmiwy <wyf000219@126.com> * fix: make header correct when using group by(#538) Signed-off-by: Shmiwy <wyf000219@126.com> * fix: map group_keys to ColumnDesc Signed-off-by: Shmiwy wyf000219@gamil.com Signed-off-by: Shmiwy <wyf000219@126.com> * fix: use expr's return type instead of child schema's Signed-off-by: Shmiwy wyf000219@gamil.com Signed-off-by: Shmiwy <wyf000219@126.com> * build: bump toolchain and deps, remove unused dep (#629) Signed-off-by: TennyZhuang <zty0826@gmail.com> Signed-off-by: Shmiwy <wyf000219@126.com> Co-authored-by: JAYICE <49588871+JayiceZ@users.noreply.github.com> Co-authored-by: TennyZhuang <zty0826@gmail.com>

shmiwy mentioned this issue Apr 13, 2022

fix: make header correct when using group by #624

Merged

skyzh closed this as completed in #624 Apr 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The header of group by is incorrect #538

The header of group by is incorrect #538

xxchan commented Mar 1, 2022

skyzh commented Apr 1, 2022

skyzh commented Apr 1, 2022

skyzh commented Apr 1, 2022

shmiwy commented Apr 13, 2022 •

edited

Loading

skyzh commented Apr 13, 2022

skyzh commented Apr 13, 2022 •

edited

Loading

The header of group by is incorrect #538

The header of group by is incorrect #538

Comments

xxchan commented Mar 1, 2022

skyzh commented Apr 1, 2022

skyzh commented Apr 1, 2022

Case 1

Case 2

skyzh commented Apr 1, 2022

shmiwy commented Apr 13, 2022 • edited Loading

skyzh commented Apr 13, 2022

skyzh commented Apr 13, 2022 • edited Loading

shmiwy commented Apr 13, 2022 •

edited

Loading

skyzh commented Apr 13, 2022 •

edited

Loading