Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Some typo fixes + rebase
  • Loading branch information
alancai98 committed Dec 5, 2023
commit cea4b7ac99eb8b15b905c53e3b6035a1d68a632c
10 changes: 5 additions & 5 deletions RFCs/0051-exclude-operator.adoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A discussion from the original issue revolves around replacing items rather than just excluding them. A major use-case of PartiQL is using PartiQL as a means of performing transformations on semi-structured, open-schema data. Mentioned in the issue are also customers who have 1000+ columns in their source tables.

From how I've been reading this RFC, we might be able to provide a useful work-around -- at least for top-level values. We can take advantage of the fact that LET evaluates before EXCLUDE. See below:

SELECT t.*, someItemThatHasBeenReplaced
EXCLUDE t.b
FROM t
LET t.b + 1 AS someItemThatHasBeenReplaced

For nested attributes, however, I couldn't immediately find an intuitive solution.

With this RFC, do you expect any future necessary RFC's to add support for REPLACE? If so, in your opinion, does this RFC impede or allow for the addition of REPLACE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this RFC, do you expect any future necessary RFC's to add support for REPLACE?

That was my assumption and to leave REPLACE out of scope for this PR. REPLACE is included in the "Future possibilities" section of the RFC.

If so, in your opinion, does this RFC impede or allow for the addition of REPLACE?

I need to think more about the relationship between EXCLUDE and REPLACE. I think the syntactic rewrite included in the RFC could be adapted to support REPLACE, so I don't believe this RFC impedes an addition of REPLACE. After I get back from the Thanksgiving holiday, I'll look more into if the syntactic rewrite approach could be applied to nested attributes of REPLACE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing around a bit with the rewrite rules from the RFC, we could do something similar in the nested case branches for REPLACE of nested attributes. For example, using the query from example-tuple-attribute-as-final-step, if we had added the REPLACE clause: REPLACE t.b.field_x AS t.b.field_x * 42, the rewrite could add a WHEN branch like

WHEN LOWER(attr_1) = LOWER('b') THEN
    CASE 
        WHEN v_1 IS STRUCT THEN (
            PIVOT (
                CASE 
                    WHEN LOWER(attr_2) = LOWER('field_x') THEN v_2 * 42
                    ELSE v_2
                END
            ) AT attr_2
            FROM UNPIVOT v_1 AS v_2 AT attr_2
        )
    ELSE v_1
    END
ELSE v_1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full query could look something like:

-- EXCLUDE t.a.field_x
-- REPLACE t.b.field_x AS t.b.field_x * 42
SELECT t.*
FROM (
    SELECT VALUE {
        't':
            CASE
                WHEN t IS STRUCT THEN (
                    PIVOT (
                        CASE
                            WHEN LOWER(attr_1) = LOWER('a') THEN
                                CASE
                                    WHEN v_1 IS STRUCT THEN (
                                        PIVOT v_2 AT attr_2
                                        FROM UNPIVOT v_1 AS v_2 AT attr_2
                                        WHERE LOWER(attr_2) NOT IN [LOWER('field_x')]
                                    )
                                    ELSE v_1
                                END
                            WHEN LOWER(attr_1) = LOWER('b') THEN
                                CASE 
                                    WHEN v_1 IS STRUCT THEN (
                                        PIVOT (
                                            CASE 
                                                WHEN LOWER(attr_2) = LOWER('field_x') THEN v_2 * 42
                                                ELSE v_2
                                            END
                                        ) AT attr_2
                                        FROM UNPIVOT v_1 AS v_2 AT attr_2
                                    )
                                ELSE v_1
                                END
                            ELSE v_1
                        END
                    ) AT attr_1 FROM UNPIVOT t AS v_1 AT attr_1
                )
                ELSE t
            END
    }
    FROM <<
    {
        'a': { 'field_x': 0, 'field_y': 'zero' },  -- `field_x` excluded
        'b': { 'field_x': 1, 'field_y': 'one' },   -- `field_y` replaced with `field_y` * 42
        'c': { 'field_x': 2, 'field_y': 'two' }
    }
    >> AS t
)

, which the Kotlin implementation will output as:

<<
  {
    'a': {
      'field_y': 'zero'
    },
    'b': {
      'field_x': 42,
      'field_y': 'one'
    },
    'c': {
      'field_x': 2,
      'field_y': 'two'
    }
  }
>>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level, let's format these lines to be like 80 or 120 characters wide.

Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,12 @@ e.g. tableFoo.a[1].*[*].b['c']
* We restrict tuple attribute exclude steps to use string literals and collection index exclude steps to use int literals. Thus `<exclude paths>` are statically known. We can decide whether to add other exclude paths (e.g. expressions) if a use case arises.
* If sufficient schema is present and the path can be resolved, we assume the root of an `EXCLUDE` path can be omitted. The variable resolution rules follow what is already included in the PartiQL specification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to have an example of attribute as a variable.

* We require that every fully-qualified `<exclude path>` contain a root and at least one step. If a use case arises to exclude a binding tuple variable, then this functionality can be added.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for this limitation? We should put that here.

* S-expressions are part of the Ion type system. footnote:[https://amazon-ion.github.io/ion-docs/docs/spec.html#sexp].
* S-expressions are part of the Ion type system.footnote:[https://amazon-ion.github.io/ion-docs/docs/spec.html#sexp]
PartiQL should support s-expression types and values since PartiQL's type system is a superset over the Ion types. Because the current PartiQL specification does not formally define s-expressions operations, we consider the definition of collection index and wildcard steps on s-expressions as out-of-scope for this RFC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the statement can be less assertive; I know this is one of those hotly debated topics. The spec. says:

PartiQL’s data model extends SQL to Ion’s type system to cover schema-less and nested data. Such values can be
directly quoted with `quotes.

So text can just convey the message that s-expressions semantics as a collection type is not fully defined yet, hence is out of the scope.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This statement makes more assertions about the PartiQL value system than does the spec.


=== Rewrite Procedure
==== Step 1: subsumption of `EXCLUDE` paths
We perform the following step to ensure that there are no redundant `EXCLUDE` paths. That is, there is no path such that all of its excluded binding tuple values are excluded by another exclude path. footnote:[This subsumption step is included to make the subsequent rewrite steps easier to reason about. In a query without redundant exclude paths, this step is not necessary.]
We perform the following step to ensure that there are no redundant `EXCLUDE` paths. That is, there is no path such that all of its excluded binding tuple values are excluded by another exclude path.footnote:[This subsumption step is included to make the subsequent rewrite steps easier to reason about. In a query without redundant exclude paths, this step is not necessary.]

For each `<exclude path>` `p=root~p~s~1~...s~x~`, we compare it with all other ``<exclude path>``s. `<exclude path>` `p` is said to be subsumed by another path `q=root~q~t~1~...t~y~` and not included in the rewritten `EXCLUDE` clause if any of the following rules apply:

Expand All @@ -103,7 +103,7 @@ NOTE: The following rules assume `root~p~=root~q~`.
[[anchor-1a]] Rule 1.a::
If `y = 0` (i.e. `q` has no steps), `q` subsumes `p`.
[[anchor-1b]] Rule 1.b::
If `yx` and `s~1~...s~x~=t~1~...t~x~`, `q` subsumes `p`. Put another way if `p` has at least as many steps as `q` and the steps up to ``q``'s length are equivalent, `q` subsumes `p`.
If `xy` and `s~1~...s~x~=t~1~...t~x~`, `q` subsumes `p`. Put another way if `p` has at least as many steps as `q` and the steps up to ``q``'s length are equivalent, `q` subsumes `p`.

Otherwise, there must be some step at which `p` and `q` diverge. Let's call this step's index `i`.

Expand Down Expand Up @@ -180,7 +180,7 @@ SELECT VALUE {
'r':
CASE
WHEN ... -- branch(es) dependent on ``s~1~``'s rewrite rule
... -- nested `CASE` expressions for `s~2~...s~n~`
... -- nested `CASE` expressions for `s~2~...s~n-1~`
CASE
WHEN ... -- branch(es) dependent on ``s~n~``'s rewrite rule
ELSE <v~n-1~>
Expand Down Expand Up @@ -340,14 +340,14 @@ For multiple `EXCLUDE` paths, we employ a similar idea as the rewrite for a sing
[source,partiql,subs="+{markup-in-source}"]
----
-- Let `M` represent the number of `EXCLUDE` paths
-- Let `R` represent the number of unique `EXCLUDE` path roots

-- Original query:
<select clause>
EXCLUDE p~1~,...,p~M~
<from clause>
<other clauses>

-- Let `R` represent the number of unique `EXCLUDE` path roots
-- Rewritten to:
<select clause>
FROM (
Expand Down