From 2967f798bd670ed367baee8930101ebeb397f3c6 Mon Sep 17 00:00:00 2001 From: Mats Rydberg Date: Mon, 6 Mar 2017 15:21:30 +0100 Subject: [PATCH 1/6] Add the LIMIT CIP Fixes #194 --- .../CIP2017-03-01-SKIP-and-LIMIT.adoc | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 cip/1.accepted/CIP2017-03-01-SKIP-and-LIMIT.adoc diff --git a/cip/1.accepted/CIP2017-03-01-SKIP-and-LIMIT.adoc b/cip/1.accepted/CIP2017-03-01-SKIP-and-LIMIT.adoc new file mode 100644 index 0000000000..647e802ab6 --- /dev/null +++ b/cip/1.accepted/CIP2017-03-01-SKIP-and-LIMIT.adoc @@ -0,0 +1,101 @@ += CIP2017-03-01 - LIMIT subclause +:numbered: +:toc: +:toc-placement: macro +:source-highlighter: codemirror + +*Author:* Mats Rydberg + +toc::[] + +== Background + +This CIP is a proposal in answer to link:https://github.com/opencypher/openCypher/issues/194[CIR-2017-194]. + +== Proposal + +The `LIMIT` subclause is used to constrain the cardinality of its parent clause by providing an upper limit. +This can be useful for data exploration, or verifying partial results of expensive queries. + +=== Syntax + +.Syntax overview: +[source, ebnf] +---- +clause-with-limit = read-only-clause, [ limit ] ; +read-only-clause = match + | with + | unwind + | return + ; +limit = "LIMIT", expr ; +---- + +=== Semantics + +The `LIMIT` subclause prevents records passing through its parent clause after the specified amount of rows, as determined by the limit expression, has been processed. +For these semantics to be well defined, the limit expression must be constant over the query lifetime, such as parameters or literals. + +==== Updating queries + +The use of `LIMIT` opens the possibility for certain performance optimisations. +Clauses that come early in the query do not have to be evaluated over the full dataset, just enough to reach the subsequent limit. +These optimisations are however not always applicable in combination with updating clauses. +Semantics between clauses is defined such that _all_ of a previous clause is processed (logically) before _any_ of a subsequent clause is processed. +This means that _all_ side effects must happen before a `LIMIT` is allowed to halt the processing of records in preceding clauses. + +Consider the below query: + +.Create a producer for each item, return first 100 product ids. +[source, cypher] +---- +MATCH (i:Item) +CREATE (i)-[:PRODUCED_BY]->(:Producer) +RETURN i.productId +LIMIT 100 +---- + +This query must execute its `CREATE` clause once for every `:Item` node, even though only 100 records are to be returned. + +If the user intention is to only do a partial update of the graph, the query must be rewritten: + +.Create a producer for the 100 first items, return their product ids. +[source, cypher] +---- +MATCH (i:Item) +LIMIT 100 +CREATE (i)-[:PRODUCED_BY]->(:Producer) +RETURN i.productId +---- + +=== Examples + +.Limiting a pattern match: +[source, cypher] +---- +MATCH (a:Person) +WHERE a.name STARTS WITH 'And' +LIMIT $limit +RETURN a.age, a.name +---- + +.Limiting between query parts: +[source, cypher] +---- +MATCH (a:Person) +WHERE a.age < 18 +SET a.child = true +WITH a +LIMIT 100 +MATCH (a)<-[:PARENT_OF]-(p) +RETURN p.age, p.name +---- + +.Limiting the query result: +[source, cypher] +---- +MATCH (a:Person) +WHERE a.age > 18 +RETURN p.age, p.name +LIMIT 100 +---- From 3a3f1981e3216d92e3ac091c267e09f30304c724 Mon Sep 17 00:00:00 2001 From: Mats Rydberg Date: Wed, 8 Mar 2017 09:57:58 +0100 Subject: [PATCH 2/6] Add example for how only constant expressions are allowed - Rename file to correct name for the CIP - Improve language --- ...doc => CIP2017-03-01-LIMIT-subclause.adoc} | 35 ++++++++++++++----- 1 file changed, 27 insertions(+), 8 deletions(-) rename cip/1.accepted/{CIP2017-03-01-SKIP-and-LIMIT.adoc => CIP2017-03-01-LIMIT-subclause.adoc} (62%) diff --git a/cip/1.accepted/CIP2017-03-01-SKIP-and-LIMIT.adoc b/cip/1.accepted/CIP2017-03-01-LIMIT-subclause.adoc similarity index 62% rename from cip/1.accepted/CIP2017-03-01-SKIP-and-LIMIT.adoc rename to cip/1.accepted/CIP2017-03-01-LIMIT-subclause.adoc index 647e802ab6..286269276e 100644 --- a/cip/1.accepted/CIP2017-03-01-SKIP-and-LIMIT.adoc +++ b/cip/1.accepted/CIP2017-03-01-LIMIT-subclause.adoc @@ -10,7 +10,7 @@ toc::[] == Background -This CIP is a proposal in answer to link:https://github.com/opencypher/openCypher/issues/194[CIR-2017-194]. +This CIP is a proposal in response to link:https://github.com/opencypher/openCypher/issues/194[CIR-2017-194]. == Proposal @@ -33,20 +33,39 @@ limit = "LIMIT", expr ; === Semantics -The `LIMIT` subclause prevents records passing through its parent clause after the specified amount of rows, as determined by the limit expression, has been processed. +The `LIMIT` subclause prevents records passing through its parent clause after the specified amount of rows, as determined by the expression used as the argument to `LIMIT`, has been processed. For these semantics to be well defined, the limit expression must be constant over the query lifetime, such as parameters or literals. +In other words, arbitrary expressions or variables are not valid, as the values of these may change across records, and it would not be clear which value should be used by the `LIMIT`. + +.Parameters and literals are global constants, and may be used as arguments to `LIMIT`: +[source, cypher] +---- +MATCH (a:Label) +LIMIT $matchLimit +RETURN a.prop +LIMIT 100 +---- + +.Variables and expressions involving variables are (in general) not constant, and may not be used as arguments to `LIMIT`: +[source, cypher] +---- +MATCH (a:Label) +LIMIT a.prop // not guaranteed to be constant -- error! +RETURN a.prop +LIMIT size((a)-->()) // not guaranteed to be constant -- error! +---- ==== Updating queries -The use of `LIMIT` opens the possibility for certain performance optimisations. -Clauses that come early in the query do not have to be evaluated over the full dataset, just enough to reach the subsequent limit. +The use of `LIMIT` opens up the possibility for certain performance optimisations. +Clauses appearing earlier in the query only need to be evaluated until the limit is reached, as opposed to evaluating the entire dataset. These optimisations are however not always applicable in combination with updating clauses. Semantics between clauses is defined such that _all_ of a previous clause is processed (logically) before _any_ of a subsequent clause is processed. -This means that _all_ side effects must happen before a `LIMIT` is allowed to halt the processing of records in preceding clauses. +This means that _all_ side effects must happen before a `LIMIT` is allowed to terminate the processing of records in preceding clauses. -Consider the below query: +Consider the following query: -.Create a producer for each item, return first 100 product ids. +.Create a producer for each item, returning the first 100 product ids. [source, cypher] ---- MATCH (i:Item) @@ -59,7 +78,7 @@ This query must execute its `CREATE` clause once for every `:Item` node, even th If the user intention is to only do a partial update of the graph, the query must be rewritten: -.Create a producer for the 100 first items, return their product ids. +.Create a producer for the top 100 items, and return their product ids. [source, cypher] ---- MATCH (i:Item) From aa6a008c858bbb8b1902a4bb2fc52505b8ecd1ee Mon Sep 17 00:00:00 2001 From: Mats Rydberg Date: Fri, 31 Mar 2017 09:57:26 +0200 Subject: [PATCH 3/6] Change title to `Subclauses` --- ...03-01-LIMIT-subclause.adoc => CIP2017-03-01-Subclauses.adoc} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename cip/1.accepted/{CIP2017-03-01-LIMIT-subclause.adoc => CIP2017-03-01-Subclauses.adoc} (99%) diff --git a/cip/1.accepted/CIP2017-03-01-LIMIT-subclause.adoc b/cip/1.accepted/CIP2017-03-01-Subclauses.adoc similarity index 99% rename from cip/1.accepted/CIP2017-03-01-LIMIT-subclause.adoc rename to cip/1.accepted/CIP2017-03-01-Subclauses.adoc index 286269276e..45aa36d4a9 100644 --- a/cip/1.accepted/CIP2017-03-01-LIMIT-subclause.adoc +++ b/cip/1.accepted/CIP2017-03-01-Subclauses.adoc @@ -1,4 +1,4 @@ -= CIP2017-03-01 - LIMIT subclause += CIP2017-03-01 - Subclauses :numbered: :toc: :toc-placement: macro From 2343256ebb1fedc5e2ab94a95b34e52383daa86b Mon Sep 17 00:00:00 2001 From: Mats Rydberg Date: Fri, 31 Mar 2017 11:27:54 +0200 Subject: [PATCH 4/6] Add SKIP, WHERE and ORDER BY --- cip/1.accepted/CIP2017-03-01-Subclauses.adoc | 99 +++++++++++++++++--- 1 file changed, 87 insertions(+), 12 deletions(-) diff --git a/cip/1.accepted/CIP2017-03-01-Subclauses.adoc b/cip/1.accepted/CIP2017-03-01-Subclauses.adoc index 45aa36d4a9..140d25655d 100644 --- a/cip/1.accepted/CIP2017-03-01-Subclauses.adoc +++ b/cip/1.accepted/CIP2017-03-01-Subclauses.adoc @@ -14,28 +14,61 @@ This CIP is a proposal in response to link:https://github.com/opencypher/openCyp == Proposal +Cypher features several _subclause_ constructs. +These are elements of the language able to amend the semantics of a (parent) clause with specific behaviour, but that do not make sense in and of themselves. +The complete list of subclauses is: + +- `LIMIT` +- `ORDER BY` +- `SKIP` +- `WHERE` + The `LIMIT` subclause is used to constrain the cardinality of its parent clause by providing an upper limit. -This can be useful for data exploration, or verifying partial results of expensive queries. + +The `ORDER BY` subclause is used to order records, in an order determined by the expression provided to the subclause. + +The `SKIP` subclause is used to reduce the cardinality of its parent clause by a constant amount. + +The `WHERE` subclause is used to reduce the cardinality of its parent clause by passing records through a predicate. +Only records for which the predicate evaluates to `true` (under Cypher's three-value logic) are kept for further processing. === Syntax .Syntax overview: [source, ebnf] ---- -clause-with-limit = read-only-clause, [ limit ] ; -read-only-clause = match - | with - | unwind - | return - ; -limit = "LIMIT", expr ; +read-only-clause = read-only-parent, subclause ; +read-only-parent = match + | with + | unwind + | return + ; +subclause = limit + | order-by + | skip + | where + ; +limit = "LIMIT", expr ; +order-by = "ORDER BY", expr ; +skip = "SKIP", expr ; +where = "WHERE", expr ; ---- === Semantics -The `LIMIT` subclause prevents records passing through its parent clause after the specified amount of rows, as determined by the expression used as the argument to `LIMIT`, has been processed. -For these semantics to be well defined, the limit expression must be constant over the query lifetime, such as parameters or literals. -In other words, arbitrary expressions or variables are not valid, as the values of these may change across records, and it would not be clear which value should be used by the `LIMIT`. +Detailed semantics of `ORDER BY` and ordering is out of scope of this CIP. +Instead, refer to the https://github.com/opencypher/openCypher/blob/master/cip/1.accepted/CIP2016-06-14-Define-comparability-and-equality-as-well-as-orderability-and-equivalence.adoc[comparability CIP]. + +All subclauses require an expression as argument, the _subclause expression_, which determines the exact behaviour of the subclause. +For `SKIP` and `LIMIT`, the subclause expression is constrained to only expressions which are constant over the query lifetime, such as (expressions derived from) parameters and literals. +In other words, arbitrary expressions or variables are not valid, as the values of these may change across records, and it would not be clear which value should be used by the subclause. + +This restriction does not apply to `WHERE` or `ORDER BY`. + +==== LIMIT + +The `LIMIT` subclause prevents records passing through its parent clause after the specified amount of records, as determined by the subclause expression, has been processed. +Which exact records that are kept is generally undefined, but may be controlled through the use of `ORDER BY`. .Parameters and literals are global constants, and may be used as arguments to `LIMIT`: [source, cypher] @@ -55,7 +88,7 @@ RETURN a.prop LIMIT size((a)-->()) // not guaranteed to be constant -- error! ---- -==== Updating queries +===== Updating queries The use of `LIMIT` opens up the possibility for certain performance optimisations. Clauses appearing earlier in the query only need to be evaluated until the limit is reached, as opposed to evaluating the entire dataset. @@ -87,6 +120,37 @@ CREATE (i)-[:PRODUCED_BY]->(:Producer) RETURN i.productId ---- +==== WHERE + +The `WHERE` subclause is used for filtering using an arbitrary (boolean) expression, or _predicate_. +The predicate is evaluated for each incoming record, and the record is kept if and only if the predicate evaluated to `true`. +Since Cypher expressions work under three-valued logic, this means that records for which the predicate evaluates to `null` as well as `false` are discarded. +Further details on the semantics of expression evaluation is out of scope of this CIP. + +==== SKIP + +The `SKIP` subclause acts as a drop-all filter with an upper bound. +The first `n` records, as determined by the subclause expression, are discarded from further processing. +Which exact records that are discarded is generally undefined, but may be controlled through the use of `ORDER BY`. + +.Parameters and literals are global constants, and may be used as arguments to `SKIP`: +[source, cypher] +---- +MATCH (a:Label) +SKIP $skipAmount +RETURN a.prop +SKIP 100 +---- + +.Variables and expressions involving variables are (in general) not constant, and may not be used as arguments to `SKIP`: +[source, cypher] +---- +MATCH (a:Label) +SKIP a.prop // not guaranteed to be constant -- error! +RETURN a.prop +SKIP size((a)-->()) // not guaranteed to be constant -- error! +---- + === Examples .Limiting a pattern match: @@ -118,3 +182,14 @@ WHERE a.age > 18 RETURN p.age, p.name LIMIT 100 ---- + +.Combining `SKIP`, `LIMIT` and `ORDER BY`: +[source, cypher] +---- +MATCH (a:Person) +WHERE a.age > 18 +RETURN p.age, p.name +ORDER BY p.age +SKIP 10 +LIMIT 100 +---- From 5339768762212a9e49bd937184947e33dea1f011 Mon Sep 17 00:00:00 2001 From: Mats Rydberg Date: Fri, 31 Mar 2017 13:19:17 +0200 Subject: [PATCH 5/6] Reorder subclauses to be consistent with grammar order --- cip/1.accepted/CIP2017-03-01-Subclauses.adoc | 84 ++++++++++---------- 1 file changed, 40 insertions(+), 44 deletions(-) diff --git a/cip/1.accepted/CIP2017-03-01-Subclauses.adoc b/cip/1.accepted/CIP2017-03-01-Subclauses.adoc index 140d25655d..4adfdbe5e7 100644 --- a/cip/1.accepted/CIP2017-03-01-Subclauses.adoc +++ b/cip/1.accepted/CIP2017-03-01-Subclauses.adoc @@ -18,40 +18,36 @@ Cypher features several _subclause_ constructs. These are elements of the language able to amend the semantics of a (parent) clause with specific behaviour, but that do not make sense in and of themselves. The complete list of subclauses is: -- `LIMIT` +- `WHERE` - `ORDER BY` - `SKIP` -- `WHERE` +- `LIMIT` -The `LIMIT` subclause is used to constrain the cardinality of its parent clause by providing an upper limit. +The `WHERE` subclause is used to reduce the cardinality of its parent clause by passing records through a predicate. +Only records for which the predicate evaluates to `true` (under Cypher's three-value logic) are kept for further processing. The `ORDER BY` subclause is used to order records, in an order determined by the expression provided to the subclause. The `SKIP` subclause is used to reduce the cardinality of its parent clause by a constant amount. -The `WHERE` subclause is used to reduce the cardinality of its parent clause by passing records through a predicate. -Only records for which the predicate evaluates to `true` (under Cypher's three-value logic) are kept for further processing. +The `LIMIT` subclause is used to constrain the cardinality of its parent clause by providing an upper limit. === Syntax .Syntax overview: [source, ebnf] ---- -read-only-clause = read-only-parent, subclause ; +read-only-clause = read-only-parent, subclauses ; read-only-parent = match | with | unwind | return ; -subclause = limit - | order-by - | skip - | where - ; -limit = "LIMIT", expr ; +subclauses = [ where ], [ order-by ], [ skip ], [ limit ] ; +where = "WHERE", expr ; order-by = "ORDER BY", expr ; skip = "SKIP", expr ; -where = "WHERE", expr ; +limit = "LIMIT", expr ; ---- === Semantics @@ -65,6 +61,37 @@ In other words, arbitrary expressions or variables are not valid, as the values This restriction does not apply to `WHERE` or `ORDER BY`. +==== WHERE + +The `WHERE` subclause is used for filtering using an arbitrary (boolean) expression, or _predicate_. +The predicate is evaluated for each incoming record, and the record is kept if and only if the predicate evaluated to `true`. +Since Cypher expressions work under three-valued logic, this means that records for which the predicate evaluates to `null` as well as `false` are discarded. +Further details on the semantics of expression evaluation is out of scope of this CIP. + +==== SKIP + +The `SKIP` subclause acts as a drop-all filter with an upper bound. +The first `n` records, as determined by the subclause expression, are discarded from further processing. +Which exact records that are discarded is generally undefined, but may be controlled through the use of `ORDER BY`. + +.Parameters and literals are global constants, and may be used as arguments to `SKIP`: +[source, cypher] +---- +MATCH (a:Label) +SKIP $skipAmount +RETURN a.prop +SKIP 100 +---- + +.Variables and expressions involving variables are (in general) not constant, and may not be used as arguments to `SKIP`: +[source, cypher] +---- +MATCH (a:Label) +SKIP a.prop // not guaranteed to be constant -- error! +RETURN a.prop +SKIP size((a)-->()) // not guaranteed to be constant -- error! +---- + ==== LIMIT The `LIMIT` subclause prevents records passing through its parent clause after the specified amount of records, as determined by the subclause expression, has been processed. @@ -120,37 +147,6 @@ CREATE (i)-[:PRODUCED_BY]->(:Producer) RETURN i.productId ---- -==== WHERE - -The `WHERE` subclause is used for filtering using an arbitrary (boolean) expression, or _predicate_. -The predicate is evaluated for each incoming record, and the record is kept if and only if the predicate evaluated to `true`. -Since Cypher expressions work under three-valued logic, this means that records for which the predicate evaluates to `null` as well as `false` are discarded. -Further details on the semantics of expression evaluation is out of scope of this CIP. - -==== SKIP - -The `SKIP` subclause acts as a drop-all filter with an upper bound. -The first `n` records, as determined by the subclause expression, are discarded from further processing. -Which exact records that are discarded is generally undefined, but may be controlled through the use of `ORDER BY`. - -.Parameters and literals are global constants, and may be used as arguments to `SKIP`: -[source, cypher] ----- -MATCH (a:Label) -SKIP $skipAmount -RETURN a.prop -SKIP 100 ----- - -.Variables and expressions involving variables are (in general) not constant, and may not be used as arguments to `SKIP`: -[source, cypher] ----- -MATCH (a:Label) -SKIP a.prop // not guaranteed to be constant -- error! -RETURN a.prop -SKIP size((a)-->()) // not guaranteed to be constant -- error! ----- - === Examples .Limiting a pattern match: From 3369038fb300419ee9e88965d636bd476d0f06a2 Mon Sep 17 00:00:00 2001 From: Petra Selmer Date: Wed, 17 Jan 2018 16:25:52 +0000 Subject: [PATCH 6/6] Reformatted title --- cip/1.accepted/CIP2017-03-01-Subclauses.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cip/1.accepted/CIP2017-03-01-Subclauses.adoc b/cip/1.accepted/CIP2017-03-01-Subclauses.adoc index 4adfdbe5e7..b8d854cb62 100644 --- a/cip/1.accepted/CIP2017-03-01-Subclauses.adoc +++ b/cip/1.accepted/CIP2017-03-01-Subclauses.adoc @@ -1,4 +1,4 @@ -= CIP2017-03-01 - Subclauses += CIP2017-03-01 Subclauses :numbered: :toc: :toc-placement: macro