[CALCITE-5707] Add ARRAY_CONTAINS function (enabled in Spark library) #3207

liuyongvs · 2023-05-17T07:47:00Z

ARRAY_CONTAINS - Returns true if the array contains the value

For more details
https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html

JiajunBernoulli · 2023-05-20T07:01:46Z

core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java


+  /** Support the ARRAY_CONTAINS function. */
+  public static boolean contains(List list, Object element) {
+    final Set set = new HashSet(list);


Is this for faster search?

yeap, it is no different with for loop, may cost some space. if you think it is need, i will change it to for loop

In fact, set has not been reused, and I don't think it can improve speed.

Because the construction time of the set may be longer than the search time of the list.

yeap, fixed @JiajunBernoulli

I don't think you need this method at all. The code generator can just call java.util.List.contains. You will need to add it to BuiltInMethod.

@julianhyde I have considered this， but it can't. because the array_contains have 2 args instead of 1 in java.util.List.contains

core/src/main/java/org/apache/calcite/sql/type/ArrayElementOperandTypeChecker.java

JiajunBernoulli · 2023-05-20T07:39:04Z

core/src/main/java/org/apache/calcite/sql/type/ArrayElementOperandTypeChecker.java

+/**
+ * Parameter type-checking strategy where types must be Array and Array element type.
+ */
+public class ArrayElementOperandTypeChecker implements SqlOperandTypeChecker {


Many codes are same as MultisetOperandTypeChecker, Can we extract them?

yes refer it.
do not find a good idea to abstract it.
MultisetOperandTypeChecker for two Multiset
ArrayElementOperandTypeChecker for Array and element type

core/src/main/java/org/apache/calcite/sql/fun/SqlLibraryOperators.java

liuyongvs · 2023-05-21T11:56:07Z

hi @JiajunBernoulli fix conflict and all your reviews, and thanks for your review so much

testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java

liuyongvs · 2023-06-06T15:27:50Z

hi @tanclary , will you also help review this? because other functions depends it(type ARRAY_ELEMENT_ARG), such as array_position/array_remove/array_append/array_prepend and so on, i will support them in one pr

core/src/main/java/org/apache/calcite/sql/type/ArrayElementOperandTypeChecker.java

core/src/main/java/org/apache/calcite/sql/type/SqlTypeTransforms.java

site/_docs/reference.md

tanclary · 2023-06-06T16:14:33Z

testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java

+        "No match found for function signature "
+            + "ARRAY_CONTAINS\\(<INTEGER ARRAY>, <NUMERIC>\\)", false);
+
+    final SqlOperatorFixture f = f0.withLibrary(SqlLibrary.SPARK);


again curious what the behavior is/should be if you search an array of type X for a value of type Y, obviously it would return false but should it be allowed in the first place?

Good point, @tanclary. The validator should give an error if you - say - search for a BOOLEAN in a DATE ARRAY. We should add a test case to this test method.

hi @julianhyde @tanclary , there is the unit test in the end
f.checkFails("^array_contains(array[1, 2], true)^",
"INTEGER is not comparable to BOOLEAN", false);

liuyongvs · 2023-06-07T01:55:25Z

hi @tanclary @JiajunBernoulli @julianhyde @MasseGuillaume fix all your reviews, do you have time to look again?

MasseGuillaume · 2023-06-07T02:13:29Z

testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java

+    f.checkScalar("array_contains(array[1, null], cast(null as integer))", true,
+        "BOOLEAN NOT NULL");


Do we want to type check exactly as Apache Spark does?

spark.sql("select array_contains(array(1, null), null)").show() org.apache.spark.sql.AnalysisException: cannot resolve 'array_contains(array(1, CAST(NULL AS INT)), NULL)' due to data type mismatch: Null typed values cannot be used as arguments; line 1 pos 7;

you should use array_contains(array[1, null], cast(null as integer)
and for my side, spark the behavior is not good the second arg is null, also return null.
so i does flink way, in flink array_contains(array[1, null], cast(null as integer) it return true, while in spark return null

public static final BuiltInFunctionDefinition ARRAY_CONTAINS = BuiltInFunctionDefinition.newBuilder() .name("ARRAY_CONTAINS") .kind(SCALAR) .inputTypeStrategy( sequence( Arrays.asList("haystack", "needle"), Arrays.asList( logical(LogicalTypeRoot.ARRAY), ARRAY_ELEMENT_ARG))) .outputTypeStrategy( nullableIfArgs( ConstantArgumentCount.of(0), explicit(DataTypes.BOOLEAN()))) .runtimeClass( "org.apache.flink.table.runtime.functions.scalar.ArrayContainsFunction") .build();

spark.sql("""select array_contains(array(1), cast(null as integer))""").show() this works indeed.

MasseGuillaume · 2023-06-07T02:21:59Z

testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java

+    f.checkScalar("array_contains(array[map[1, 'a'], map[2, 'b']], map[1, 'a'])", true,
+        "BOOLEAN NOT NULL");


spark.sql("""select array_contains(array(map(1, "1"), map(2, "2")), map(2, "2"))""").show() org.apache.spark.sql.AnalysisException: cannot resolve 'array_contains(array(map(1, '1'), map(2, '2')), map(2, '2'))' due to data type mismatch: function array_contains does not support ordering on type map<int,string>; line 1 pos 7;

Due to implementation limitation, currently Spark can't compare or do equality check between map types. As a result, map values can't appear in EQUAL or comparison expressions, can't be grouping key, etc.()
while calcite Map runtime implementation using java collection Map, which supports equality check. while spark not

/** * This is an internal data representation for map type in Spark SQL. This should not implement * `equals` and `hashCode` because the type cannot be used as join keys, grouping keys, or * in equality tests. See SPARK-9415 and PR#13847 for the discussions. */ abstract class MapData extends Serializable {

apache/spark#23045

JiajunBernoulli · 2023-06-08T01:52:06Z

@liuyongvs Please resolve conflict files.

liuyongvs · 2023-06-08T02:52:34Z

hi @JiajunBernoulli @julianhyde fix conflicts and align with spark instead of flink's more reasonable behavior

sonarqubecloud · 2023-06-08T03:04:45Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
3 Code Smells

79.3% Coverage
0.0% Duplication

julianhyde

@liuyongvs @tanclary @JiajunBernoulli @MasseGuillaume Thanks to all who reviewed. I think this is in good shape. Do you all agree? If so I'll merge.

I would change only two things:

Remove SqlFunctions.arrayContains and use java.util.List.contains directly.
Add some comments to the test about Spark vs Flink behavor.

Replace SqlFunctions.arrayContains with List.contains Tweak Util.distinctList, and add note that SqlFunctions.distinct could use a similar algorithm. Close apache#3207

Flink has a similar function, but has slightly different behavior from Spark. array_contains(array[1, null], cast(null as integer)) returns TRUE in Flink, UNKNOWN in Spark. This change implements the Spark behavior. Replace SqlFunctions.arrayContains with List.contains (Julian Hyde). Tweak Util.distinctList, and add note that SqlFunctions.distinct could use a similar algorithm (Julian Hyde). Close apache#3207

liuyongvs force-pushed the array_contains branch from db743ce to 3e95a13 Compare May 18, 2023 02:02

liuyongvs changed the title ~~[CALCITE-5707] Add ARRAY_CONTAINS function (enabled in Spark library).~~ [CALCITE-5707] Add ARRAY_CONTAINS function (enabled in Spark library) May 18, 2023

JiajunBernoulli reviewed May 20, 2023

View reviewed changes

core/src/main/java/org/apache/calcite/sql/type/ArrayElementOperandTypeChecker.java Show resolved Hide resolved

JiajunBernoulli reviewed May 20, 2023

View reviewed changes

core/src/main/java/org/apache/calcite/sql/fun/SqlLibraryOperators.java Outdated Show resolved Hide resolved

JiajunBernoulli self-assigned this May 20, 2023

liuyongvs requested a review from JiajunBernoulli May 20, 2023 08:20

liuyongvs force-pushed the array_contains branch from 3e95a13 to b865e86 Compare May 21, 2023 11:54

liuyongvs requested a review from julianhyde May 22, 2023 04:56

MasseGuillaume reviewed Jun 1, 2023

View reviewed changes

testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java Show resolved Hide resolved

liuyongvs force-pushed the array_contains branch from b865e86 to 87b64c5 Compare June 1, 2023 03:37

liuyongvs requested a review from MasseGuillaume June 1, 2023 03:39

tanclary reviewed Jun 6, 2023

View reviewed changes

core/src/main/java/org/apache/calcite/sql/type/ArrayElementOperandTypeChecker.java Outdated Show resolved Hide resolved

tanclary reviewed Jun 6, 2023

View reviewed changes

core/src/main/java/org/apache/calcite/sql/type/SqlTypeTransforms.java Outdated Show resolved Hide resolved

tanclary reviewed Jun 6, 2023

View reviewed changes

site/_docs/reference.md Outdated Show resolved Hide resolved

tanclary reviewed Jun 6, 2023

View reviewed changes

liuyongvs force-pushed the array_contains branch from b930242 to ef1a9c9 Compare June 7, 2023 01:53

liuyongvs requested a review from tanclary June 7, 2023 01:53

MasseGuillaume reviewed Jun 7, 2023

View reviewed changes

liuyongvs requested a review from MasseGuillaume June 7, 2023 10:05

MasseGuillaume approved these changes Jun 7, 2023

View reviewed changes

liuyongvs added 4 commits June 8, 2023 10:22

[CALCITE-5707] Add ARRAY_CONTAINS function (enabled in Spark library)

a523b30

fix review

4026d10

fix review

fccd5dd

fix functions name

d582f96

fix review

1b372a3

liuyongvs force-pushed the array_contains branch from ef1a9c9 to 1b372a3 Compare June 8, 2023 02:50

julianhyde force-pushed the main branch 2 times, most recently from 8a5cf83 to cf7f71b Compare June 8, 2023 21:21

JiajunBernoulli assigned JiajunBernoulli and unassigned JiajunBernoulli Jun 9, 2023

julianhyde reviewed Jun 10, 2023

View reviewed changes

JiajunBernoulli approved these changes Jun 10, 2023

View reviewed changes

julianhyde added a commit to julianhyde/calcite that referenced this pull request Jun 10, 2023

Fixup

1074291

Replace SqlFunctions.arrayContains with List.contains Tweak Util.distinctList, and add note that SqlFunctions.distinct could use a similar algorithm. Close apache#3207

asfgit closed this in 3dfefd1 Jun 10, 2023

		f.checkScalar("array_contains(array[1, null], cast(null as integer))", true,
		"BOOLEAN NOT NULL");

		f.checkScalar("array_contains(array[map[1, 'a'], map[2, 'b']], map[1, 'a'])", true,
		"BOOLEAN NOT NULL");

[CALCITE-5707] Add ARRAY_CONTAINS function (enabled in Spark library) #3207

[CALCITE-5707] Add ARRAY_CONTAINS function (enabled in Spark library) #3207

Uh oh!

Conversation

liuyongvs commented May 17, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liuyongvs commented May 21, 2023

Uh oh!

Uh oh!

liuyongvs commented Jun 6, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liuyongvs commented Jun 7, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liuyongvs Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JiajunBernoulli commented Jun 8, 2023

Uh oh!

liuyongvs commented Jun 8, 2023

Uh oh!

sonarqubecloud bot commented Jun 8, 2023

Uh oh!

julianhyde left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

liuyongvs Jun 7, 2023 •

edited

Loading