Author: Mats Rydberg <mats@neotechnology.com>
This CIP is a proposal in response to CIR-2017-194.
Cypher features several subclause constructs. These are elements of the language able to amend the semantics of a (parent) clause with specific behaviour, but that do not make sense in and of themselves. The complete list of subclauses is:
-
WHERE
-
ORDER BY
-
SKIP
-
LIMIT
The WHERE
subclause is used to reduce the cardinality of its parent clause by passing records through a predicate.
Only records for which the predicate evaluates to true
(under Cypher’s three-value logic) are kept for further processing.
The ORDER BY
subclause is used to order records, in an order determined by the expression provided to the subclause.
The SKIP
subclause is used to reduce the cardinality of its parent clause by a constant amount.
The LIMIT
subclause is used to constrain the cardinality of its parent clause by providing an upper limit.
read-only-clause = read-only-parent, subclauses ;
read-only-parent = match
| with
| unwind
| return
;
subclauses = [ where ], [ order-by ], [ skip ], [ limit ] ;
where = "WHERE", expr ;
order-by = "ORDER BY", expr ;
skip = "SKIP", expr ;
limit = "LIMIT", expr ;
Detailed semantics of ORDER BY
and ordering is out of scope of this CIP.
Instead, refer to the comparability CIP.
All subclauses require an expression as argument, the subclause expression, which determines the exact behaviour of the subclause.
For SKIP
and LIMIT
, the subclause expression is constrained to only expressions which are constant over the query lifetime, such as (expressions derived from) parameters and literals.
In other words, arbitrary expressions or variables are not valid, as the values of these may change across records, and it would not be clear which value should be used by the subclause.
This restriction does not apply to WHERE
or ORDER BY
.
The WHERE
subclause is used for filtering using an arbitrary (boolean) expression, or predicate.
The predicate is evaluated for each incoming record, and the record is kept if and only if the predicate evaluated to true
.
Since Cypher expressions work under three-valued logic, this means that records for which the predicate evaluates to null
as well as false
are discarded.
Further details on the semantics of expression evaluation is out of scope of this CIP.
The SKIP
subclause acts as a drop-all filter with an upper bound.
The first n
records, as determined by the subclause expression, are discarded from further processing.
Which exact records that are discarded is generally undefined, but may be controlled through the use of ORDER BY
.
SKIP
:MATCH (a:Label)
SKIP $skipAmount
RETURN a.prop
SKIP 100
SKIP
:MATCH (a:Label)
SKIP a.prop // not guaranteed to be constant -- error!
RETURN a.prop
SKIP size((a)-->()) // not guaranteed to be constant -- error!
The LIMIT
subclause prevents records passing through its parent clause after the specified amount of records, as determined by the subclause expression, has been processed.
Which exact records that are kept is generally undefined, but may be controlled through the use of ORDER BY
.
LIMIT
:MATCH (a:Label)
LIMIT $matchLimit
RETURN a.prop
LIMIT 100
LIMIT
:MATCH (a:Label)
LIMIT a.prop // not guaranteed to be constant -- error!
RETURN a.prop
LIMIT size((a)-->()) // not guaranteed to be constant -- error!
The use of LIMIT
opens up the possibility for certain performance optimisations.
Clauses appearing earlier in the query only need to be evaluated until the limit is reached, as opposed to evaluating the entire dataset.
These optimisations are however not always applicable in combination with updating clauses.
Semantics between clauses is defined such that all of a previous clause is processed (logically) before any of a subsequent clause is processed.
This means that all side effects must happen before a LIMIT
is allowed to terminate the processing of records in preceding clauses.
Consider the following query:
MATCH (i:Item)
CREATE (i)-[:PRODUCED_BY]->(:Producer)
RETURN i.productId
LIMIT 100
This query must execute its CREATE
clause once for every :Item
node, even though only 100 records are to be returned.
If the user intention is to only do a partial update of the graph, the query must be rewritten:
MATCH (i:Item)
LIMIT 100
CREATE (i)-[:PRODUCED_BY]->(:Producer)
RETURN i.productId
MATCH (a:Person)
WHERE a.name STARTS WITH 'And'
LIMIT $limit
RETURN a.age, a.name
MATCH (a:Person)
WHERE a.age < 18
SET a.child = true
WITH a
LIMIT 100
MATCH (a)<-[:PARENT_OF]-(p)
RETURN p.age, p.name
MATCH (a:Person)
WHERE a.age > 18
RETURN p.age, p.name
LIMIT 100
SKIP
, LIMIT
and ORDER BY
:MATCH (a:Person)
WHERE a.age > 18
RETURN p.age, p.name
ORDER BY p.age
SKIP 10
LIMIT 100