-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmark for Presto IN expression #516
Conversation
CC: @laithsakka |
@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
9b929f8
to
ebd0bda
Compare
@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
8274996
to
764ddd3
Compare
@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: The benchmark shows that IN expression with large number of values is very slow. This is because IN is implemented as a function with variable number of arguments. IN with 1K values is represented as an expression tree with one node for the IN, one for the column and 1K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making new shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` ============================================================================ velox/functions/prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 4.38ms 228.06 in 34.65% 12.65ms 79.02 fastIn1K 4.02ms 248.83 in1K 1.84% 218.08ms 4.59 ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: fb733bba97b0416680357acc0e41b6fcab1b642d
764ddd3
to
1ca13de
Compare
This pull request was exported from Phabricator. Differential Revision: D31997776 |
Summary: The benchmark shows that IN expression with large number of values is very slow. This is because IN is implemented as a function with variable number of arguments. IN with 1K values is represented as an expression tree with one node for the IN, one for the column and 1K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making new shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` ============================================================================ velox/functions/prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 4.38ms 228.06 in 34.65% 12.65ms 79.02 fastIn1K 4.02ms 248.83 in1K 1.84% 218.08ms 4.59 ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: 8b43b2471eb943bf9f8d84227ba1e3aed31b51c7
1ca13de
to
67a5a04
Compare
This pull request was exported from Phabricator. Differential Revision: D31997776 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D31997776 |
Summary: The benchmark shows that IN expression with large number of values is very slow. This is because IN is implemented as a function with variable number of arguments. IN with 1K values is represented as an expression tree with one node for the IN, one for the column and 1K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making new shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` ============================================================================ velox/functions/prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 4.38ms 228.06 in 34.65% 12.65ms 79.02 fastIn1K 4.02ms 248.83 in1K 1.84% 218.08ms 4.59 ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: 17fa62157984a1181d09c04dcd7351ddfeace6c2
67a5a04
to
e06b819
Compare
Summary: The benchmark shows that IN expression with large number of values is incredibly slow. This is because IN is implemented as a function with variable number of arguments. IN with 10K values is represented as an expression tree with one node for the IN, one for the column and 10K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 3.49ms 286.17 in 55.08% 6.34ms 157.62 fastIn10K 3.12ms 320.43 in10K 0.07% 4.72s 211.75m ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: 55d20a05beda182be9d4e5de6d426a55884861ad
Summary: The benchmark shows that IN expression with large number of values is very slow. This is because IN is implemented as a function with variable number of arguments. IN with 1K values is represented as an expression tree with one node for the IN, one for the column and 1K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making new shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` ============================================================================ velox/functions/prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 4.38ms 228.06 in 34.65% 12.65ms 79.02 fastIn1K 4.02ms 248.83 in1K 1.84% 218.08ms 4.59 ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: c2cdf821cadb50a87c8998d5ad88f64f0d097db2
e06b819
to
32d08e1
Compare
This pull request was exported from Phabricator. Differential Revision: D31997776 |
Summary: The benchmark shows that IN expression with large number of values is incredibly slow. This is because IN is implemented as a function with variable number of arguments. IN with 10K values is represented as an expression tree with one node for the IN, one for the column and 10K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 3.49ms 286.17 in 55.08% 6.34ms 157.62 fastIn10K 3.12ms 320.43 in10K 0.07% 4.72s 211.75m ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: e9e4ed363ffed4c2da5f52d9db886621a313134e
Summary: The benchmark shows that IN expression with large number of values is incredibly slow. This is because IN is implemented as a function with variable number of arguments. IN with 10K values is represented as an expression tree with one node for the IN, one for the column and 10K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 3.49ms 286.17 in 55.08% 6.34ms 157.62 fastIn10K 3.12ms 320.43 in10K 0.07% 4.72s 211.75m ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: 746fd2fde2d787fee80b6c09bf215439b5a75a4a
Summary: The benchmark shows that IN expression with large number of values is very slow. This is because IN is implemented as a function with variable number of arguments. IN with 1K values is represented as an expression tree with one node for the IN, one for the column and 1K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making new shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` ============================================================================ velox/functions/prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 4.38ms 228.06 in 34.65% 12.65ms 79.02 fastIn1K 4.02ms 248.83 in1K 1.84% 218.08ms 4.59 ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: 40da127b8b9a46181c26e2cf2eb0f975f05fc6f4
32d08e1
to
1bea574
Compare
This pull request was exported from Phabricator. Differential Revision: D31997776 |
Summary: The benchmark shows that IN expression with large number of values is incredibly slow. This is because IN is implemented as a function with variable number of arguments. IN with 10K values is represented as an expression tree with one node for the IN, one for the column and 10K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 3.49ms 286.17 in 55.08% 6.34ms 157.62 fastIn10K 3.12ms 320.43 in10K 0.07% 4.72s 211.75m ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: 03550457be779504825cc03d30a9c227fe5942aa
Summary: The benchmark shows that IN expression with large number of values is incredibly slow. This is because IN is implemented as a function with variable number of arguments. IN with 10K values is represented as an expression tree with one node for the IN, one for the column and 10K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 3.49ms 286.17 in 55.08% 6.34ms 157.62 fastIn10K 3.12ms 320.43 in10K 0.07% 4.72s 211.75m ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: 0d2878624915cf44055f58f5192b58d6131b46f2
Summary: The benchmark shows that IN expression with large number of values is incredibly slow. This is because IN is implemented as a function with variable number of arguments. IN with 10K values is represented as an expression tree with one node for the IN, one for the column and 10K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 3.49ms 286.17 in 55.08% 6.34ms 157.62 fastIn10K 3.12ms 320.43 in10K 0.07% 4.72s 211.75m ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: fb84e677975ca12c3db6fe02ed5a639fd14968a7
Summary: The benchmark shows that IN expression with large number of values is incredibly slow. This is because IN is implemented as a function with variable number of arguments. IN with 10K values is represented as an expression tree with one node for the IN, one for the column and 10K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 3.49ms 286.17 in 55.08% 6.34ms 157.62 fastIn10K 3.12ms 320.43 in10K 0.07% 4.72s 211.75m ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: 141481cca54274d6cae24e14b3dccd7c1c34456d
Summary: The benchmark shows that IN expression with large number of values is incredibly slow. This is because IN is implemented as a function with variable number of arguments. IN with 10K values is represented as an expression tree with one node for the IN, one for the column and 10K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 3.49ms 286.17 in 55.08% 6.34ms 157.62 fastIn10K 3.12ms 320.43 in10K 0.07% 4.72s 211.75m ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: 359d76141b472922f31b78edcfab19282e8b67e7
Summary: The benchmark shows that IN expression with large number of values is incredibly slow. This is because IN is implemented as a function with variable number of arguments. IN with 10K values is represented as an expression tree with one node for the IN, one for the column and 10K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 3.49ms 286.17 in 55.08% 6.34ms 157.62 fastIn10K 3.12ms 320.43 in10K 0.07% 4.72s 211.75m ============================================================================ ``` Pull Request resolved: facebookincubator#516 Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: f188d320b2157e8e635e4875a481a63e653299e8
@mbasmanova merged this pull request in 1764e89. |
Summary: The benchmark shows that IN expression with large number of values is very slow. This is because IN is implemented as a function with variable number of arguments. IN with 1K values is represented as an expression tree with one node for the IN, one for the column and 1K nodes for values. Each value is wrapped in a ConstantExpr which gets evaluated for each batch of rows making new shared pointers to constant vectors. A follow-up PR will optimize IN expression to take values as a single array. ``` ============================================================================ velox/functions/prestosql/benchmarks/InBenchmark.cpprelative time/iter iters/s ============================================================================ fastIn 4.38ms 228.06 in 34.65% 12.65ms 79.02 fastIn1K 4.02ms 248.83 in1K 1.84% 218.08ms 4.59 ============================================================================ ``` Pull Request resolved: facebookincubator#516 Reviewed By: funrollloops Differential Revision: D31997776 Pulled By: mbasmanova fbshipit-source-id: 0ed9a8b559016b84b1fee070c4536f28cd4776d8
Summary: X-link: facebook/fbthrift#516 X-link: facebook/watchman#1050 X-link: facebook/proxygen#426 X-link: facebook/folly#1842 X-link: facebook/fboss#117 Sadly, even though Ubuntu 18.04 is still in LTS, GitHub is deprecating its runner image. Migrate the generated GitHub Actions to 20.04. actions/runner-images#6002 https://github.blog/changelog/2022-08-09-github-actions-the-ubuntu-18-04-actions-runner-image-is-being-deprecated-and-will-be-removed-by-12-1-22/ Reviewed By: fanzeyi Differential Revision: D38877286 fbshipit-source-id: 85f3324d6666eacb190a43985585b438de69d545
The benchmark shows that IN expression with large number of values is incredibly
slow. This is because IN is implemented as a function with variable number of
arguments. IN with 10K values is represented as an expression tree with one
node for the IN, one for the column and 10K nodes for values. Each value is
wrapped in a ConstantExpr which gets evaluated for each batch of rows making ne
shared pointers to constant vectors.
A follow-up PR will optimize IN expression to take values as a single array.