diff --git a/cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc b/cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc new file mode 100644 index 0000000000..c8954dcf7a --- /dev/null +++ b/cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc @@ -0,0 +1,375 @@ += CIP2016-12-16 Constraints syntax +:numbered: +:toc: +:toc-placement: macro +:source-highlighter: codemirror + +*Author:* Mats Rydberg + +[abstract] +.Abstract +-- +This CIP describes syntax and semantics for Cypher constraints. +These are language constructs that impose restrictions on the shape of the data graph, and how statements are allowed to change it. +-- + +toc::[] + +== Background + +Cypher has a loose notion of a schema, in which nodes and relationships may take very heterogeneous forms, both in terms of properties and in graph patterns. +Constraints allow us to mould the heterogeneous nature of the property graph into a more regular form. + +== Proposal + +This CIP specifies the general syntax for constraint definition (and constraint removal), and provides several examples of possible use cases for constraints. +However, the specification does not otherwise specify or limit the space of expressible constraints that the syntax and semantics allow. + +This specification also covers the return structure of constraint commands, see <>. + +=== Syntax + +The constraint syntax is defined as follows: + +.Grammar definition for constraint syntax. +[source, ebnf] +---- +constraint command = create-constraint | drop-constraint ; +add-constraint = "CREATE", "CONSTRAINT", [ constraint-name ], "FOR", pattern, "REQUIRE", constraint-predicate, { "REQUIRE", constraint-predicate } ; +constraint-name = symbolic-name +constraint-predicate = expression | unique | node-key ; +unique = "UNIQUE", property-expression +node-key = "NODE KEY", property-expression, { ",", property-expression } +drop-constraint = "DROP", "CONSTRAINT", constraint-name ; +---- + +The `REQUIRE` clause works exactly like the `WHERE` clause in a standard Cypher query, with the addition of also supporting the special constraint operators `UNIQUE` and `NODE KEY`. +This allows for very complex concrete constraint definitions (using custom predicates) within the specified syntax. + +For details on `UNIQUE` and `NODE KEY`, see the dedicated sections below: <>, <>. + +==== Constraint names + +All constraints provide the user the option to specify a nonempty _name_ at constraint creation time. +This name is subsequently the handle with which a user may refer to the constraint, for example when dropping it. +In the case where a name is not provided, the system will generate a unique name. + +==== Removing constraints + +A constraint is removed by referring to its name. + +.Example of dropping a constraint with name `foo`: +[source, cypher] +---- +DROP CONSTRAINT foo +---- + +=== Semantics + +The semantics for constraints follow these general rules: + +1. The constraint pattern define the constraint _domain_, where all entities that would be returned by a `MATCH` clause with the same pattern constitute the domain, with one notable exception (see <>). + +2. The constraint expressions defined in the `REQUIRE` clauses of the constraint definition must all evaluate to `true`, at all times. + +3. [[domain-exception]]Entities for which a constraint expression evaluate to `null` under Cypher's ternary logic are _excluded_ from the constraint domain, even if they fit within the constraint pattern. + +==== Errors + +The following list describes the situations in which an error will be raised: + +* Attempting to add a constraint on a graph where the data does not comply with a constraint predicate. +* Attempting to add a constraint with a name that already exists. +* Attempting to add a constraint that the underlying engine does not support enforcing. +* Attempting to drop a constraint referencing a non-existent name. +* Attempting to modify the graph in such a way that it would violate a constraint. + +==== Mutability + +Once a constraint has been added, it may not be amended. +Should a user wish to change a constraint definition, the constraint has to be dropped and added anew with an updated structure. + +[[uniqueness]] +==== Uniqueness + +The new operator `UNIQUE` is only valid as part of a constraint predicate. +It takes as argument a single property expression, and asserts that this property is unique across the domain of the constraint. +Following on rule <> above, entities for which the property is not defined (is `null`) are not part of the constraint domain. + +.Example of a constraint definition using `UNIQUE`, over the domain of nodes labeled with `:Person`: +[source, cypher] +---- +CREATE CONSTRAINT only_one_person_per_name +FOR (p:Person) +REQUIRE UNIQUE p.name +---- + +[[node-key]] +==== Node key + +The new operator `NODE KEY` is only valid as part of a constraint predicate. +It takes as argument one or more property expressions, and asserts that the combination of the evaluated values of the expressions (forming a tuple) is unique across the constraint domain. +It further asserts that the property expressions all exist on the entities of the domain, and thus avoids applicability of rule <> above. +The domain of a node key constraint is thus exactly defined as all entities which fit the constraint pattern. + +.Example of a constraint definition using `NODE KEY`, over the domain of nodes labeled with `:Person`: +[source, cypher] +---- +CREATE CONSTRAINT person_details +FOR (p:Person) +REQUIRE NODE KEY p.name, p.email, p.address +---- + +In the context of a single property, a semantically equivalent constraint is achieved by composing the use of the `UNIQUE` operator with `exists()` predicates, as exemplified by: + +.Example of a constraint definition equivalent to a `NODE KEY` on a single property `name`: +[source, cypher] +---- +CREATE CONSTRAINT person_details +FOR (p:Person) +REQUIRE UNIQUE p.name +REQUIRE exists(p.name) +---- + +==== Compositionality + +It is possible to define multiple `REQUIRE` clauses within the scope of the same constraint. +The semantics between these is that of a conjunction (under standard 2-valued boolean logic) between the constraint predicates of the clauses, such that the constraint is upheld if and only if for all `REQUIRE` clauses, the joint predicate evaluates to `true`. + +[[return-record]] +==== Return record + +Since constraints always are named, but user-defined names are optional, the system must sometimes generate a constraint name. +In order for a user to be able to drop such a constraint, the system-generated name is therefore returned in a standard Cypher result record. +The result record has a fixed structure, with three string fields: `name`, `definition`, and `details`. + +A constraint command will always return exactly one record, if successful. +Note that also `DROP CONSTRAINT` will return a record. + +===== Name + +This field contains the name of the constraint, either user- or system-defined. + +===== Definition + +This field contains the constraint definition, which is the contents of the constraint creation command following (and including) the `FOR` clause. + +===== Details + +The contents of this field are left unspecified, to be used for implementation-specific messages and/or details. + +.Example: consider the following constraint: +[source, Cypher] +---- +CREATE CONSTRAINT myConstraint +FOR (n:Node) +REQUIRE NODE KEY n.prop1, n.prop2 +---- + +A correct result record for it could be: + +---- +name | definition | details +----------------------------------------------------------------------- +myConstraint | FOR (n:NODE) | n/a + | REQUIRE NODE KEY n.prop1, n.prop2 | +---- + +=== Examples + +In this section we provide several examples of constraints that are possible to express in the specified syntax. + +[NOTE] +The specification in this CIP is limited to the general syntax of constraints, and the following are simply examples of possible uses of the language defined by that syntax. None of the examples provided are to be viewed as mandatory for any Cypher implementation. + +Consider the graph added by the statement below. +The graph contains nodes labeled with `:Color`. +Each color is represented as an integer-type RGB value in a property `rgb`. +Users may look up nodes labeled with `:Color` to extract their RGB values for application processing. +Users may also add new `:Color`-labeled nodes to the graph. + +[source, cypher] +---- +CREATE (:Color {name: 'white', rgb: 0xffffff}) +CREATE (:Color {name: 'black', rgb: 0x000000}) +CREATE (:Color {name: 'very, very dark grey', rgb: 0x000000}) // rounding error! +---- + +Owing to the duplication of the `rgb` property, the following attempt at adding a constraint will fail: + +[source, cypher] +---- +CREATE CONSTRAINT only_one_color_per_rgb +FOR (c:Color) +REQUIRE UNIQUE c.rgb +---- + +Now, consider the following query which retrieves the RGB value of a color with a given `name`: + +[source, cypher] +---- +MATCH (c:Color {name: $name}) +WHERE exists(c.rgb) +RETURN c.rgb +---- + +The `WHERE` clause is here used to prevent an application from retrieving `null` values for user-defined colors where the RGB values have not been specified correctly. +It may, however, be eliminated by the introduction of a constraint asserting the existence of that property: + +[source, cypher] +---- +CREATE CONSTRAINT colors_must_have_rgb +FOR (c:Color) +REQUIRE exists(c.rgb) +---- + +Any updating statement that would create a `:Color` node without specifying an `rgb` property for it would now fail. + +If we instead want to make the _combination_ of the properties `name` and `rgb` unique, while simultaneously mandating their existence, we could use a `NODE KEY` operator to capture all these requirements in a single constraint: + +[source, cypher] +---- +CREATE CONSTRAINT color_schema +FOR (c:Color) +REQUIRE NODE KEY c.rgb, c.name +---- + +This constraint will make sure that all `:Color` nodes has a value for their `rgb` and `name` properties, and that the combination is unique across all the nodes. +This would allow several `:Color` nodes named `'grey'`, as long as their `rgb` values are distinct. + +More complex constraint definitions are considered below: + +.Multiple property existence using conjunction +[source, cypher] +---- +CREATE CONSTRAINT person_properties +FOR (p:Person) +REQUIRE exists(p.name) AND exists(p.email) +---- + +.Using larger pattern +[source, cypher] +---- +CREATE CONSTRAINT not_rating_own_posts +FOR (u1:User)-[:RATED]->(:Post)<-[:POSTED_BY]-(u2:User) +REQUIRE u.name <> u2.name +---- + +.Property value limitations +[source, cypher] +---- +CREATE CONSTRAINT road_width +FOR ()-[r:ROAD]-() +REQUIRE 5 < r.width < 50 +---- + +.Cardinality +[source, cypher] +---- +CREATE CONSTRAINT spread_the_love +FOR (p:Person) +REQUIRE size((p)-[:LOVES]->()) > 3 +---- + +.Endpoint requirements +[source, cypher] +---- +CREATE CONSTRAINT can_only_own_things +FOR ()-[:OWNS]->(t) +REQUIRE (t:Vehicle) OR (t:Building) OR (t:Object) +---- + +.Label coexistence +[source, cypher] +---- +CREATE CONSTRAINT programmers_are_people_too +FOR (p:Programmer) +REQUIRE p:Person +---- + +Assuming a function `acyclic()` that takes a path as argument and returns `true` if and only if the same node does not appear twice in the path, otherwise `false`, we may express: + +.Constraint example from CIR-2017-172 +[source, cypher] +---- +CREATE CONSTRAINT enforce_dag_acyclic_for_R_links +FOR p = ()-[:R*]-() +REQUIRE acyclic(p) +---- + +=== Interaction with existing features + +The main interaction between the constraints and the rest of the language occurs during updating statements. +Existing constraints will cause some updating statements to fail, thereby fulfilling the main purpose of this feature. + +=== Alternatives + +Alternative syntaxes have been discussed: + +* `GIVEN`, `CONSTRAIN`, `ASSERT` instead of `FOR` +* `ASSERT`, `ENFORCE`, `IMPLIES` instead of `REQUIRE` +* `ADD` instead of `CREATE` +** It is desirable for verb pairs for modifying operations to be consistent in the language, and recent discussions are (so far informally) suggesting `INSERT`/`DELETE` to be used for data modification, thus making `CREATE` and `DROP` available for schema modification such as constraints. + +The use of an existing expression to express uniqueness -- instead of using the operator `UNIQUE` -- becomes unwieldy for multiple properties, as exemplified by the following: +---- +FOR (p:Person), (q:Person) +REQUIRE p.email <> q.email AND p <> q +---- + +== What others do + +In SQL, the following constraints exist (inspired by http://www.w3schools.com/sql/sql_constraints.asp): + +* `NOT NULL` - Indicates that a column cannot store a null value. +* `UNIQUE` - Ensures that each row for a column must have a unique value. +* `PRIMARY KEY` - A combination of a `NOT NULL` and `UNIQUE`. Ensures that a column (or a combination of two or more columns) has a unique identity, reducing the resources required to locate a specific record in a table. +* `FOREIGN KEY` - Ensures the referential integrity of the data in one table matches values in another table. +* `CHECK` - Ensures that the value in a column meets a specific condition +* `DEFAULT` - Specifies a default value for a column. + +The `NOT NULL` SQL constraint is expressible using an `exists()` constraint predicate. +The `UNIQUE` SQL constraint is exactly as Cypher's `UNIQUE` constraint predicate. +The `PRIMARY KEY` SQL constraint is exactly as Cypher's `NODE KEY` constraint predicate. + +SQL constraints may be introduced at table creation time in a `CREATE TABLE` statement, or in an `ALTER TABLE` statement: + +.Creating a `Person` table in SQL Server / Oracle / MS Access: +[source, sql] +---- +CREATE TABLE Person +( + P_Id int NOT NULL UNIQUE, + LastName varchar(255) NOT NULL, + FirstName varchar(255)) +---- + +.Creating a `Person` table in MySQL: +[source, sql] +---- +CREATE TABLE Person +( + P_Id int NOT NULL, + LastName varchar(255) NOT NULL, + FirstName varchar(255) + UNIQUE (P_Id) +) +---- + +.Adding a named composite `UNIQUE` constraint in MySQL / SQL Server / Oracle / MS Access: +[source, sql] +---- +ALTER TABLE Person +ADD CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName) +---- + +== Benefits to this proposal + +Constraints make Cypher's notion of schema more well-defined, allowing users to maintain graphs in a more regular, easier-to-manage form. + +Additionally, this specification is deliberately defining a constraint _language_ within which a great deal of possible concrete constraints are made possible. +This allows different implementers of Cypher to independently choose how to limit the scope of supported constraint expressions that fit their model and targeted use cases, while retaining a common and consistent semantic and syntactic model. + +== Caveats to this proposal + +Some constraints may prove challenging to enforce in a system seeking to implement the contents of this CIP, as these generally require scanning through large parts of the graph to locate conflicting entities. diff --git a/cip/vendor-extensions/neo4j/CIP2016-12-14-Neo4j-indexes.adoc b/cip/vendor-extensions/neo4j/CIP2016-12-14-Neo4j-indexes.adoc new file mode 100644 index 0000000000..105a2f65bd --- /dev/null +++ b/cip/vendor-extensions/neo4j/CIP2016-12-14-Neo4j-indexes.adoc @@ -0,0 +1,242 @@ += CIP2016-12-16 Neo4j Indexes +:numbered: +:toc: +:toc-placement: macro +:source-highlighter: codemirror + +*Author:* Mats Rydberg + +[abstract] +.Abstract +-- +This CIP details Neo4j's indexing extension to Cypher, which is based on the standardised constraints syntax. +-- + +toc::[] + +== Background + +In Neo4j, indexes are formed using label and property combinations. +This enables queries that reference these label/property combinations to use the index for faster lookup with reduced cardinality overhead. + +== Proposal + +Indexes in Neo4j are able to index _labeled nodes_ only. +These nodes are kept in a separate, persisted data structure which allows lookups based on providing values for the specified indexed properties. + +While not going into exact detail on every aspect, this proposal is intended to comply with all rules stated in the Constraint Syntax CIP, where applicable. + +=== Syntax + +The index syntax is based on the constraint syntax (see the Constraint Syntax CIP), and is detailed below: + +.Grammar definition for Neo4j index syntax. +[source, ebnf] +---- +index-command = create-index | drop-index ; +create-index = "CREATE", "INDEX", [ index-name ], "FOR", index-pattern, "ON", index-key ; +index-pattern = node-pattern +index-name = symbolic-name +index-key = property-expression { ",", property-expression } ; +drop-index = "DROP", "INDEX", index-name ; +---- + +The `index-key` expression defines the key for the index, and consist of one or more property expressions, which refer to the entity defined in the pattern. + +==== Index names + +Just like constraints, indexes have names. +If the user does not provide a name, a system-generated name will be generated. + +==== Removing indexes + +An index is removed by referring to its name. + +.Example of dropping an index with name `index-1`: +[source, cypher] +---- +DROP INDEX index-1 +---- + +=== Semantics + +Indexes do not impose any semantics on the graph, or on queries. +They exist solely for performance reasons. +In other words, any query on any graph should behave exactly identical in the presence of indexes as they would otherwise. + +==== Domain + +For a node to be considered part of an index domain, it is required that it + +A. has the label referenced in the index pattern +B. [[B]]has a value different from `null` for all properties referenced in the index key + +A consequence of <> is that an index will only partially support queries that project the indexed properties. +However, queries that pose predicates on the indexed properties will still enjoy full support in many cases. +See <> for more details on this difference. + +==== Errors + +The following list describes the situations in which an error will be raised: + +* Attempting to create an index with a name that already exists. +* Attempting to create an index that the underlying engine does not support enforcing. +* Attempting to drop an index referencing a non-existent name. + +==== Mutability + +Once an index has been created, its definition may not be amended. +Should a user wish to change the definition of an index, the index will have to be dropped and recreated with the amended definition. + +[[return-record]] +==== Return record + +Similar to the Constraint Syntax CIP, index commands will yield a single return record. +The result record has a fixed structure, with three string fields: `name`, `definition`, and `details`. + +An index command will always return exactly one record, if successful. +Note that also `DROP INDEX` will return a record. + +===== Name + +This field contains the name of the index, either user- or system-defined. + +===== Definition + +This field contains the index definition, which is the contents of the index creation command following (and including) the `FOR` clause. + +===== Details + +The contents of this field are left unspecified, to be used for implementation-specific messages and/or details. + +.Example: consider the following index: +[source, cypher] +---- +CREATE INDEX myIndex +FOR (n:Node) +ON n.prop1, n.prop2 +---- + +A correct result record for it could be: + +---- +name | definition | details +--------------------------------------- +myIndex | FOR (n:NODE) | n/a + | ON n.prop1, n.prop2 | +---- + +=== Examples + +Creating indexes is straight-forward following the specified syntax. + +.An index with multiple properties +[source, cypher] +---- +CREATE INDEX addresses +FOR (a:Address) +ON a.street, a.city, a.country +---- + +.An index with a single property +[source, cypher] +---- +CREATE INDEX person_names +FOR (p:Person) +ON p.name +---- + +[[domain-example]] +==== Domain example + +Consider a graph of `:Person` nodes with `name`, `email`, and `age` properties. +Not all nodes in this graph has all properties. +On this graph we declare the following index on all the properties: + +[source, cypher] +---- +CREATE INDEX person_properties +FOR (p:Person) +ON p.name, p.email, p.age +---- + +Queries that _project_ these properties will be unable to find all nodes for its result in the index domain. +The projection query is required to return all nodes regardless of whether the projected properties contain non-null values or not, and nodes with `null` for any of the referenced properties will not be found in the index domain. + +.Projection query: +[source, cypher] +---- +MATCH (p:Person) +RETURN p.name, p.age, p.email +---- + +Queries that pose _conjunctive predicates_ on the properties will however be able to find all required nodes in the index domain. +The predicate query is only required to return all nodes that passes the predicate, and predicates on non-existing properties will discard the tuple. +This applies even when the predicate does not reference all indexed properties. + +.Conjunctive predicate query: +[source, cypher] +---- +MATCH (p:Person) +WHERE p.email ENDS WITH '@opencypher.org' + AND p.age > 25 +RETURN p.name, p.age, p.email +---- + +[NOTE] +While this example is generally applicable, some predicate constructs behave differently for `null` values and need to taken into special consideration. + +.Predicate with special `null` semantics: +[source, cypher] +---- +MATCH (p:Person) +WHERE p.email IS NULL + AND p.age > 25 +RETURN p.name, p.age, p.email +---- + +In this query the index domain does not contain all nodes required for the result. +Similar reasoning must be applied to disjunctive predicates which reference expressions other than indexed properties (e.g. `WHERE p.age > 25 OR p.country = 'SWE'` ). + +==== Combination with Neo4j constraints + +In Neo4j, constraints are generally upheld through the use of indexes. +Neo4j supports three types of constraints: property uniqueness, property existence, and node key. +These are expressed as exemplified below. + +.A Neo4j property uniqueness constraint +[source, cypher] +---- +CREATE CONSTRAINT one_address_per_street +FOR (a:Address) +REQUIRE UNIQUE a.street +---- + +.A Neo4j node property existence constraint +[source, cypher] +---- +CREATE CONSTRAINT streets_on_all_addresses +FOR (a:Address) +REQUIRE exists(a.street) +---- + +.A Neo4j node key constraint +[source, cypher] +---- +CREATE CONSTRAINT address_key +FOR (a:Address) +REQUIRE NODE KEY a.street, a.city, a.country +---- + +Creating a constraint as outlined above will also create a matching index. +It will not be possible to drop that index without also dropping the constraint. + +An exception to this rule is the relationship existence constraint, which is not upheld by the use of an index. + +.A Neo4j relationship property existence constraint +[source, cypher] +---- +CREATE CONSTRAINT owning_must_have_start_time +FOR ()-[o:OWNS]->() +REQUIRE exists(o.since) +---- diff --git a/docs/standardisation-scope.adoc b/docs/standardisation-scope.adoc index 3663597885..5e6ead0a9b 100644 --- a/docs/standardisation-scope.adoc +++ b/docs/standardisation-scope.adoc @@ -44,6 +44,10 @@ It is the goal of this project to create a good and feature-rich standard langua * `allShortestPaths()` * `shortestPath()` +=== Commands + +* `CREATE CONSTRAINT` + === Operators ==== General @@ -210,7 +214,6 @@ It is the goal of this project to create a good and feature-rich standard langua === Commands -* `CREATE CONSTRAINT` * `CREATE INDEX` === Operators diff --git a/grammar/basic-grammar.xml b/grammar/basic-grammar.xml index 64a78a90a9..9efef247de 100644 --- a/grammar/basic-grammar.xml +++ b/grammar/basic-grammar.xml @@ -146,7 +146,7 @@ - + : &WS; @@ -716,6 +716,8 @@ ANY NONE SINGLE + NODE + KEY diff --git a/grammar/commands.xml b/grammar/commands.xml index 3466bee5b6..01bc9bf947 100644 --- a/grammar/commands.xml +++ b/grammar/commands.xml @@ -44,7 +44,7 @@ xmlns:rr="http://opencypher.org/railroad" xmlns:oc="http://opencypher.org/opencypher"> - + @@ -57,10 +57,52 @@ + + + + + + + + + + CREATE &SP; CONSTRAINT &SP; &SP; + FOR &SP; &SP; + REQUIRE &SP; + + + + DROP &SP; CONSTRAINT &SP; + + + + + + + + + ( &var; &label; ) + ( ) - [ &var; ] - ( ) + + + + + + + + - + + UNIQUE &SP; + + + + NODE &SP; KEY &SP; &WS; , &WS; + + + CREATE &SP; diff --git a/tools/grammar/src/test/resources/cypher.txt b/tools/grammar/src/test/resources/cypher.txt index 786eeadb69..18d25f2560 100644 --- a/tools/grammar/src/test/resources/cypher.txt +++ b/tools/grammar/src/test/resources/cypher.txt @@ -312,3 +312,23 @@ CALL db.labels() YIELD * WHERE label CONTAINS 'User' AND foo + bar = foo RETURN count(label) AS numLabels§ CALL db.labels() YIELD x WHERE label CONTAINS 'User' AND foo + bar = foo RETURN count(label) AS numLabels§ +CREATE CONSTRAINT foo +FOR (p:Person) +REQUIRE UNIQUE p.name§ +CREATE CONSTRAINT baz +FOR (p:Person) +REQUIRE exists(p.name)§ +CREATE CONSTRAINT cru +FOR ()-[r:REL]-() +REQUIRE exists(r.property)§ +DROP CONSTRAINT foo_bar_baz§ +CREATE CONSTRAINT nodeKey +FOR (n:Node) +REQUIRE NODE KEY n.prop§ +CREATE CONSTRAINT nodeKey +FOR (n:Node) +REQUIRE NODE KEY n.p1, n.p2, n.p3§ +CREATE CONSTRAINT nodeKey +FOR (n:Node) +REQUIRE NODE KEY n.p1 ,n.p2, n.p3§ +DROP CONSTRAINT foo§