Skip to content
Open
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
5c6f9d7
Added the nested subqueries CIP
Jun 22, 2016
d7d0a83
Sundry content edits to the subquery CIP
Jun 23, 2016
bf71712
Clarified the syntax wrt `OPTIONAL`
Jun 23, 2016
f6245fd
Added the notion of write subqueries, with `UNWIND` + `DO {...}` repl…
Jul 20, 2016
ede1334
Clarified the way in which variable bindings work (based on comments …
Jul 20, 2016
a1c6442
Addressed some feedback
boggle Sep 26, 2016
4caeb54
Addressing comments; making clarifications
Nov 17, 2016
5b5b9cc
Sketched out additional forms of nested subqueries.
boggle Mar 27, 2017
fe21475
Homogeneous syntax for OPTIONAL, MANDATORY, MATCH, DO WHEN
boggle Mar 30, 2017
80a1ce4
Address feedback and introduce new syntactic short forms
boggle Apr 13, 2017
b8f49d6
Add chained subqueries with `THEN` and overhaul document
boggle Apr 19, 2017
bf53252
Reflect discussion; add new conditional form of DO and WHERE shorthand
boggle Apr 20, 2017
70a91cd
Textual improvements
Apr 21, 2017
1f02e2b
Refer to Query Combinator CIP
Apr 21, 2017
cc176e8
Wording
boggle May 1, 2017
2921112
Rework CIP
boggle Oct 16, 2017
3ed1ca9
Clarify precedence rules
boggle Oct 16, 2017
2d2435f
Add ammending nested subqueries and fix query combinator precedence
boggle Oct 16, 2017
0156bc3
Fix definition of chained queries and move to right directory
boggle Oct 16, 2017
acfac59
Textual edits
Oct 17, 2017
7554cc9
Clarified query combinator semantics
boggle Oct 17, 2017
10fa182
Fixed erroneous queries
Oct 19, 2017
1ca70bf
Reformatted title
Jan 17, 2018
cfc2a43
Reworking/incorporating alternative CIP
boggle May 6, 2018
6199430
Fused with nested subqueries CIP from multigraph work
boggle May 6, 2018
5b6a333
Added stand-alone nested calls and some clarifications/fix-ups
boggle May 7, 2018
077fb18
Grammar fix
boggle May 7, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Added the nested subqueries CIP
  • Loading branch information
Petra Selmer authored and boggle committed Oct 16, 2017
commit 5c6f9d73444020c7edee5a083dce6e2ced6879a8
257 changes: 257 additions & 0 deletions cip/CIP2016-06-22-nested-subqueries.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
= CIP2016-06-22 - Nested subqueries
:numbered:
:toc:
:toc-placement: macro
:source-highlighter: codemirror

*Authors:* Petra Selmer <[email protected]>, Stefan Plantikow <[email protected]>

[abstract]
.Abstract
--
This CIP proposes the incorporation of nested subqueries to Cypher.
--

toc::[]


== Motivation

Subqueries - i.e. queries within queries - are a powerful and expressive feature allowing for:

* Increased query expressivity
* Better query construction and readability
* Easier query composition and reuse
* Post-processing as a single unit results from multiple queries

== Background

This CIP may be viewed in conjunction with the EXISTS CIP and the Pattern Comprehension CIP, both of which propose variants of subqueries.


== Proposal

We propose the addition of new syntax to the `MATCH` clause for expressing nested subqueries.

Nested subqueries are self-contained, read-only Cypher queries.

A nested subquery is denoted using the following syntax: `MATCH { <subquery> }`.

Nested subqueries may be correlated - i.e. the subquery has a dependency on the outer query - or uncorrelated.

As this proposal extends the `MATCH` clause, nested subqueries can be contained within other nested subqueries at arbitrary depth.

=== Syntax

We extend the https://github.com/opencypher/openCypher/blob/master/grammar/cypher.xml[grammar] by adding a new clause.

Extend expressions to support string search operators:
[source, ebnf]
----
nested-subquery-clause = "MATCH", "{", RegularQuery, "}" ;
----

=== Semantic clarification

Conceptually, a nested subquery is evaluated for each incoming record and may produce an arbitrary number of result records.

All incoming variables remain in scope.

Any new variable bindings produced by evaluating the subquery will augment the variable bindings of the initial record; i.e. nested subqueries behave in the same way as `UNWIND` and `CALL` with regard to the introduction of new variable bindings.

Subqueries interact with write clauses in the same manner as `MATCH`.

It is an error for a nested subquery to try to rebind (shadow) a pre-existing outer variable binding.

=== Examples

Post-UNION processing:
[source, cypher]
----
MATCH {
// authored tweets
MATCH (me:User {name: 'Alice'})-[:FOLLOWS]->(user:User),
(user)<-[:AUTHORED]-(tweet:Tweet)
RETURN tweet, tweet.time AS time, user.country AS country
UNION
// favorited tweets
MATCH (me:User {name: 'Alice'})-[:FOLLOWS]->(user:User),
(user)<-[:HAS_FAVOURITE]-(favorite:Favorite)-[:TARGETS]->(tweet:Tweet)
RETURN tweet, favourite.time AS time, user.country AS country
}
WHERE country = "se"
RETURN DISTINCT tweet
ORDER BY time DESC
LIMIT 10
----

Uncorrelated nested subquery:
[source, cypher]
----
MATCH (f:Farm {id: {farmId})
MATCH {
MATCH (u:User {id: {userId}})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(p:Lawnmower)
RETURN b.name AS name, p.code AS code
UNION
MATCH (u:User {id: {userId}})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(v:Vehicle),
(v)<-[:IS_A]-(:Category {name: 'Tractor'})
RETURN b.name AS name, p.code AS code
}
RETURN f, name, code
----

Correlated nested subquery:
[source, cypher]
----
MATCH (f:Farm {id: {farmId})-[:IS_IN]->(country:Country)
MATCH {
MATCH (u:User {id: {userId}})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(p:Lawnmower)
RETURN b.name AS name, p.code AS code
UNION
MATCH (u:User {id: {userId}})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(v:Vehicle),
(v)<-[:IS_A]-(:Category {name: 'Tractor'})
WHERE v.leftHandDrive = country.leftHandDrive
RETURN b.name AS name, p.code AS code
}
RETURN f, name, code
----

Filtered and correlated nested subquery:
[source, cypher]
----
MATCH (f:Farm)-[:IS_IN]->(country:Country)
WHERE country.name IN {countryNames}
MATCH {
MATCH (u:User {id: {userId}})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(p:Lawnmower)
RETURN b AS brand, p.code AS code
UNION
MATCH (u:User {id: {userId}})-[:LIKES]->(b:Brand),
(b)-[:PRODUCES]->(v:Vehicle),
(v)<-[:IS_A]-(:Category {name: 'Tractor'})
WHERE v.leftHandDrive = country.leftHandDrive
RETURN b AS brand, p.code AS code
}
WHERE f.type = 'organic'
AND b.certified
RETURN f, brand.name AS name, code
----

Doubly-nested subquery:
[source, cypher]
----
MATCH (f:Farm {id: {farmId}})
MATCH {
MATCH (c:Customer)-[:BUYS_FOOD_AT]->(f)
MATCH {
MATCH (c)-[:RETWEETS]->(t:Tweet)<-[:TWEETED_BY]-(f)
RETURN c, count(*) AS count
UNION
MATCH (c)-[:LIKES]->(p:Posting)<-[:POSTED_BY]-(f)
RETURN c, count(*) AS count
}
RETURN c, "customer" AS type, sum(count) AS endorsement
UNION
MATCH (s:Shop)-[:BUYS_FOOD_AT]->(f)
MATCH (s)-[:PLACES]->(a:Advertisement)-[:ABOUT]->(f)
RETURN s, "shop" AS type, count(a) * 100 AS endorsement
}
RETURN f.name AS name, type, sum(endorsement) AS endorsement
----

=== Interaction with existing features

Nested subqueries do not interact directly with any existing features.

=== Alternatives

Alternative syntax has been considered during the gestation of this document:

* Using round braces; i.e. `MATCH (...)`
* Using alternative keywords:

** `SUBQUERY`
** `QUERY`

== What others do

=== SQL

The following types of subqueries are supported in SQL:

Scalar:
[source, cypher]
----
SELECT orderID
FROM Orders
WHERE orderID =
(SELECT max(orderID) FROM Orders)
----

Multi-valued:
[source, cypher]
----
SELECT customerID
FROM Customers
WHERE customerID IN
(SELECT customerID FROM Orders)
----

Correlated:
[source, cypher]
----
SELECT orderID, customerID
FROM Orders AS O1
WHERE orderID =
(SELECT max(O2.orderID) FROM Orders AS O2
WHERE O2.customerID = O1.customerID)
----

Table-valued/table expression:
[source, cypher]
----
SELECT orderYear
FROM
(SELECT YEAR(orderDate) AS orderYear
FROM Orders) AS D
----

Both scalar and table expression subqueries are out of scope for the purposes of this CIP. They will be addressed in forthcoming CIPs.

=== SPARQL

https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#subqueries[SPARQL] only supports uncorrelated subqueries, exemplified by:

[source, cypher]
----
SELECT ?y ?minName
WHERE {
:alice :knows ?y .
{
SELECT ?y (MIN(?name) AS ?minName)
WHERE {
?y :name ?name .
} GROUP BY ?y
}
}
----

Owing to the bottom-up nature of SPARQL query evaluation, the subqueries are evaluated logically first, and the results are projected up to the outer query.

Only variables projected out of the subquery will be visible, or in scope, to the outer query.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would tend to say that this is not a disadvantage, but rather then a (good) feature.



== Benefits to this proposal

* Increasing the expressivity of the language.
* Allowing unified post-processing on results from multiple (sub)queries; this is exemplified by the https://github.com/neo4j/neo4j/issues/2725[request for post-UNION processing].
* Facilitating query readability, construction and maintainability.
* Providing a feature familiar to users of SQL.

== Caveats to this proposal

At the current time, we are not aware of any caveats.