Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Aug 28, 2015

After this PR, In/InSet/ArrayContain will return null if value is null, instead of false. They also will return null even if there is a null in the set/array.

@davies
Copy link
Contributor Author

davies commented Aug 28, 2015

cc @yhuai

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41712 has finished for PR 8492 at commit c3c65f8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Aug 28, 2015

Here is the output of some sample tests using hive 1.2.1

hive> select cast(null as string) in ('1', cast(null as string));
OK
NULL
Time taken: 0.042 seconds, Fetched: 1 row(s)
hive> select array_contains(array('1', cast(null as string)), cast(null as string));
OK
false
Time taken: 0.042 seconds, Fetched: 1 row(s)
hive> select array_contains(array('1', cast(null as string)), '2');
OK
false
Time taken: 0.048 seconds, Fetched: 1 row(s)

@yhuai
Copy link
Contributor

yhuai commented Aug 28, 2015

postgresql's output regarding in...

yhuai=# select cast(null as char(10)) in ('1', cast(null as char(10)));          ?column? 
----------

(1 row)

yhuai=# select cast(null as char(10)) in ('1', cast(null as char(10))) is null;
 ?column? 
----------
 t
(1 row)

yhuai=# select '1' in ('1', cast(null as char(10)));
 ?column? 
----------
 t
(1 row)

yhuai=# select '1' in ('2', '3');
 ?column? 
----------
 f
(1 row)

yhuai=# select '1' in ('2', '3');
 ?column? 
----------
 f
(1 row)

yhuai=# select '1' in ('2', cast(null as char(10)));
 ?column? 
----------

(1 row)

yhuai=# select '1' in ('2', cast(null as char(10))) is null;
 ?column? 
----------
 t
(1 row)

http://www.postgresql.org/docs/devel/static/functions-comparisons.html (looks like ANY/SOME (array) part is for array_contains.)

@yhuai
Copy link
Contributor

yhuai commented Aug 28, 2015

OK. I guess the main question at here is if we want to have a different semantic with hive on array_contains regarding null.

@davies
Copy link
Contributor Author

davies commented Aug 28, 2015

From PostgresSQL:

If the array expression yields a null array, the result of ANY will be null. If the left-hand expression yields null, the result of ANY is ordinarily null (though a non-strict comparison operator could possibly yield a different result). Also, if the right-hand array contains any null elements and no true comparison result is obtained, the result of ANY will be null, not false (again, assuming a strict comparison operator). This is in accordance with SQL's normal rules for Boolean combinations of null values.

It's more consistent in PostgresSQL, I'd like to follow it.

cc @rxin @marmbrus

@rxin
Copy link
Contributor

rxin commented Aug 28, 2015

I'd follow postgres here.

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #1701 has finished for PR 8492 at commit 4328e46.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41727 has finished for PR 8492 at commit 3be8d05.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41733 has finished for PR 8492 at commit 5ca7823.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need to swap left and right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call {defineCodeGen}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullSafeCodeGen is similar to defineCodeGen

@yhuai
Copy link
Contributor

yhuai commented Aug 28, 2015

+1 to follow postgres.

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41751 has finished for PR 8492 at commit 6887185.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41753 has finished for PR 8492 at commit 32ddc9f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Aug 28, 2015

LGTM

@davies
Copy link
Contributor Author

davies commented Aug 28, 2015

Merged into master and 1.5 branch.

@asfgit asfgit closed this in bb7f352 Aug 28, 2015
asfgit pushed a commit that referenced this pull request Aug 28, 2015
After this PR, In/InSet/ArrayContain will return null if value is null, instead of false. They also will return null even if there is a null in the set/array.

Author: Davies Liu <[email protected]>

Closes #8492 from davies/fix_in.

(cherry picked from commit bb7f352)
Signed-off-by: Davies Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants