-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18871][SQL] New test cases for IN/NOT IN subquery #16337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 48 commits
3b44c59
18b4a31
4f4d1c8
f5f0cbe
d8b2edb
196b6c6
f37a01e
bb5b01f
bde5820
5f7cd96
893a49a
4bbe1fd
b2dd795
8c3e5da
a0eaa40
d03c940
d728d5e
ea104dd
6ab1215
0c56653
d7a1874
85d3500
c056f91
0b8189d
c2ea31d
a2d3056
39e5648
b9370a3
01224a4
d05d39a
ee6ed88
db19296
2e399d9
0ef59bc
6fad85f
5525dff
63715e4
a084410
cdfb1ad
753d7fe
b6e5b97
7ca65ec
7165105
32dbd46
ee1e14e
c2ca009
9c584fb
1c1900a
7c129d9
895bb35
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,117 @@ | ||
| -- A test suite for GROUP BY in parent side, subquery, and both predicate subquery | ||
| -- It includes correlated cases. | ||
|
|
||
| -- tables and data types | ||
|
|
||
| CREATE DATABASE indb; | ||
| CREATE TABLE t1(t1a String, t1b Short, t1c Int, t1d Long, t1e float, t1f double, t1g DECIMAL, t1h TIMESTAMP, t1i Date) | ||
| using parquet; | ||
| CREATE TABLE t2(t2a String, t2b Short, t2c Int, t2d Long, t2e float, t2f double, t2g DECIMAL, t2h TIMESTAMP, t2i Date) | ||
| using parquet; | ||
| CREATE TABLE t3(t3a String, t3b Short, t3c Int, t3d Long, t3e float, t3f double, t3g DECIMAL, t3h TIMESTAMP, t3i Date) | ||
| using parquet; | ||
|
|
||
| -- insert to tables | ||
| INSERT INTO t1 VALUES | ||
| ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")), | ||
| ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1a', 16, 12, 21, 15, 20, 20.00, timestamp(date("2014-06-04")), date("2014-06-04")), | ||
| ('t1a', 16, 12, 10, 15, 20, 20.00, timestamp(date("2014-07-04")), date("2014-07-04")), | ||
| ('t1c', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-05")), | ||
| ('t1d', null, 16, 22, 17, 25, 26.00, timestamp(date("2014-06-04")), null), | ||
| ('t1d', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), null), | ||
| ('t1e', 10, null, 25, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-04")), | ||
| ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-04")), | ||
| ('t1d', 10, null, 12, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04")), | ||
| ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")), | ||
| ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-0=4")); | ||
|
|
||
| INSERT INTO t2 VALUES | ||
| ('t2a', 6, 12, 14, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")), | ||
| ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1b', 8, 16, 119, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04")), | ||
| ('t1c', 12, 16, 219, 17, 25, 26.00, timestamp(date("2016-05-04")), date("2016-05-04")), | ||
| ('t1b', null, 16, 319, 17, 25, 26.00, timestamp(date("2017-05-04")), null), | ||
| ('t2e', 8, null, 419, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")), | ||
| ('t1f', 19, null, 519, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")), | ||
| ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), date("2014-07-04")), | ||
| ('t1c', 12, 16, 19, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-05")), | ||
| ('t1e', 8, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-04")), | ||
| ('t1f', 19, null, 19, 17, 25, 26.00, timestamp(date("2014-10-04")), date("2014-10-04")), | ||
| ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), null); | ||
|
|
||
| INSERT INTO t3 VALUES | ||
| ('t3a', 6, 12, 110, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")), | ||
| ('t3a', 6, 12, 10, 15, 20, 20.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1b', 10, 12, 219, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1b', 8, 16, 319, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")), | ||
| ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), date("2014-07-04")), | ||
| ('t3c', 17, 16, 519, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-04")), | ||
| ('t3c', 17, 16, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-05")), | ||
| ('t1b', null, 16, 419, 17, 25, 26.00, timestamp(date("2014-10-04")), null), | ||
| ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-11-04")), null), | ||
| ('t3b', 8, null, 719, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t3b', 8, null, 19, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04")); | ||
|
|
||
| -- correlated IN subquery | ||
| -- GROUP BY in parent side | ||
| -- TC 01.01 | ||
| select t1a, avg(t1b) from t1 where t1a in (select t2a from t2) group by t1a; | ||
| -- TC 01.02 | ||
| select t1a, max(t1b) from t1 where t1b in (select t2b from t2 where t1a = t2a) group by t1a, t1d; | ||
| -- TC 01.03 | ||
| select t1a, t1b from t1 where t1c in (select t2c from t2 where t1a = t2a) group by t1a, t1b; | ||
| -- TC 01.04 | ||
| select t1a, sum(distinct(t1b)) from t1 where t1c in (select t2c from t2 where t1a = t2a) or | ||
| t1c in (select t3c from t3 where t1a = t3a) group by t1a, t1c; | ||
| -- TC 01.05 | ||
| select t1a, sum(distinct(t1b)) from t1 where t1c in (select t2c from t2 where t1a = t2a) and | ||
| t1c in (select t3c from t3 where t1a = t3a) group by t1a, t1c; | ||
| -- TC 01.06 | ||
| select t1a, count(distinct(t1b)) from t1 where t1c in (select t2c from t2 where t1a = t2a) | ||
| group by t1a, t1c having t1a = "t1b"; | ||
|
|
||
| -- GROUP BY in subquery | ||
| -- TC 01.07 | ||
| select * from t1 where t1b in (select max(t2b) from t2 group by t2a); | ||
| -- TC 01.08 | ||
| select * from (select t2a, t2b from t2 where t2a in (select t1a from t1 where t1b = t2b) group by t2a, t2b) t2; | ||
| -- TC 01.09 | ||
| select count(distinct(*)) from t1 where t1b in (select min(t2b) from t2 where t1a = t2a and t1c = t2c group by t2a); | ||
| -- TC 01.10 | ||
| select t1a, t1b from t1 where t1c in (select max(t2c) from t2 where t1a = t2a group by t2a, t2c having t2c > 8); | ||
| -- TC 01.11 | ||
| select t1a, t1b from t1 where t1c in (select t2c from t2 where t2a in | ||
| (select min(t3a) from t3 where t3a = t2a group by t3b) group by t2c); | ||
| -- TC 01.12, comment out pending SPARK-18863 | ||
| --select * from t1 where t1a in | ||
| --(select min(t2a) from t2 where t2a = t2a and t2c >= 1 group by t2c having t2c in | ||
| --(select t3c from t3 group by t3c, t3b having t2b > 6 and t3b > t2b )); | ||
| -- TC 01.13, comment out pending SPARK-18863 | ||
| --select * from (select * from t2 where t2a in (select t1a from t1 where t1b = t2b)) t2 where t2a in | ||
| --(select t2a from t2 where t2a = t2a and t2c > 1 group by t2a having t2c > 8); | ||
|
||
|
|
||
| -- GROUP BY in both | ||
| -- TC 01.14 | ||
| select t1a, min(t1b) from t1 where t1c in (select min(t2c) from t2 where t2b = t1b group by t2a) group by t1a; | ||
| -- TC 01.15 | ||
| select t1a, min(t1b) from t1 where t1c in (select min(t2c) from t2 where t2b in (select min(t3b) from t3 | ||
| where t2a = t3a group by t3a) group by t2c) group by t1a, t1d; | ||
| -- TC 01.16 | ||
| select t1a, min(t1b) from t1 where t1c in (select min(t2c) from t2 where t2b = t1b group by t2a) and | ||
| t1d in (select t3d from t3 where t1c = t3c group by t3d) group by t1a; | ||
| -- TC 01.17 | ||
| select t1a, min(t1b) from t1 where t1c in (select min(t2c) from t2 where t2b = t1b group by t2a) or | ||
| t1d in (select t3d from t3 where t1c = t3c group by t3d) group by t1a; | ||
| -- TC 01.18 | ||
| select t1a, min(t1b) from t1 where t1c in (select min(t2c) from t2 where t2b = t1b group by t2a having t2a > t1a) or | ||
| t1d in (select t3d from t3 where t1c = t3c group by t3d having t3d = t1d) group by t1a having min(t1b) is NOT NULL; | ||
|
||
|
|
||
| -- Clean Up | ||
| DROP TABLE t1; | ||
| DROP TABLE t2; | ||
| DROP TABLE t3; | ||
| USE default; | ||
| DROP DATABASE indb; | ||
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| -- A test suite for simple IN predicate subquery | ||
| -- It includes correlated cases. | ||
|
|
||
| -- tables and data types | ||
|
|
||
| CREATE DATABASE indb; | ||
| CREATE TABLE t1(t1a String, t1b Short, t1c Int, t1d Long, t1e float, t1f double, t1g DECIMAL, t1h TIMESTAMP, t1i Date) | ||
| using parquet; | ||
| CREATE TABLE t2(t2a String, t2b Short, t2c Int, t2d Long, t2e float, t2f double, t2g DECIMAL, t2h TIMESTAMP, t2i Date) | ||
| using parquet; | ||
| CREATE TABLE t3(t3a String, t3b Short, t3c Int, t3d Long, t3e float, t3f double, t3g DECIMAL, t3h TIMESTAMP, t3i Date) | ||
| using parquet; | ||
|
|
||
| -- insert to tables | ||
| INSERT INTO t1 VALUES | ||
| ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")), | ||
| ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1a', 16, 12, 21, 15, 20, 20.00, timestamp(date("2014-06-04")), date("2014-06-04")), | ||
| ('t1a', 16, 12, 10, 15, 20, 20.00, timestamp(date("2014-07-04")), date("2014-07-04")), | ||
| ('t1c', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-05")), | ||
| ('t1d', null, 16, 22, 17, 25, 26.00, timestamp(date("2014-06-04")), null), | ||
| ('t1d', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), null), | ||
| ('t1e', 10, null, 25, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-04")), | ||
| ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-04")), | ||
| ('t1d', 10, null, 12, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04")), | ||
| ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")), | ||
| ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-0=4")); | ||
|
|
||
| INSERT INTO t2 VALUES | ||
| ('t2a', 6, 12, 14, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")), | ||
| ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1b', 8, 16, 119, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04")), | ||
| ('t1c', 12, 16, 219, 17, 25, 26.00, timestamp(date("2016-05-04")), date("2016-05-04")), | ||
| ('t1b', null, 16, 319, 17, 25, 26.00, timestamp(date("2017-05-04")), null), | ||
| ('t2e', 8, null, 419, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")), | ||
| ('t1f', 19, null, 519, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")), | ||
| ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), date("2014-07-04")), | ||
| ('t1c', 12, 16, 19, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-05")), | ||
| ('t1e', 8, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-04")), | ||
| ('t1f', 19, null, 19, 17, 25, 26.00, timestamp(date("2014-10-04")), date("2014-10-04")), | ||
| ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), null); | ||
|
|
||
| INSERT INTO t3 VALUES | ||
| ('t3a', 6, 12, 110, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")), | ||
| ('t3a', 6, 12, 10, 15, 20, 20.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1b', 10, 12, 219, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t1b', 8, 16, 319, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")), | ||
| ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), date("2014-07-04")), | ||
| ('t3c', 17, 16, 519, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-04")), | ||
| ('t3c', 17, 16, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-05")), | ||
| ('t1b', null, 16, 419, 17, 25, 26.00, timestamp(date("2014-10-04")), null), | ||
| ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-11-04")), null), | ||
| ('t3b', 8, null, 719, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")), | ||
| ('t3b', 8, null, 19, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04")); | ||
|
|
||
| -- correlated IN subquery | ||
| -- simple select | ||
| -- TC 01.01 | ||
| select * from t1 where t1a in (select t2a from t2); | ||
| -- TC 01.02 | ||
| select * from t1 where t1b in (select t2b from t2 where t1a = t2a); | ||
| -- TC 01.03 | ||
| select t1a, t1b from t1 where t1c in (select t2b from t2 where t1a != t2a); | ||
| -- TC 01.04 | ||
| select t1a, t1b from t1 where t1c in (select t2b from t2 where t1a = t2a or t1b > t2b); | ||
| -- TC 01.05 | ||
| select t1a, t1b from t1 where t1c in (select t2b from t2 where t2i in (select t3i from t3 where t2c = t3c)); | ||
| -- TC 01.06 | ||
| select t1a, t1b from t1 where t1c in (select t2b from t2 where t2a in | ||
| (select t3a from t3 where t2c = t3c and t2b is not NULL)); | ||
| -- simple select for NOT IN | ||
| -- TC 01.07 | ||
| select distinct(t1a), t1b, t1h from t1 where t1a not in (select t2a from t2); | ||
| -- TC 01.08, comment out pending on SPARK-18966 | ||
| --select t1d, t1h, t1i from t1 where t1d not in (select t2d from t2 where t2h > t1h or t2i > t1i); | ||
| -- TC 01.09 | ||
| select distinct(t1a), t1b from t1 where t1b not in (select t2b from t2 where t1a < t2a and t2b > 8); | ||
| -- TC 01.10, comment out pending on SPARK-18966 | ||
| --select t1a, t1b from t1 where t1c not in (select t2b from t2 where t2a not in | ||
| -- (select t3a from t3 where t2c = t3c and t2b is NULL)); | ||
|
||
| -- TC 01.11 | ||
| select t1a, t1b from t1 where t1h not in (select t2h from t2 where t2a = t1a) and t1b not in ( | ||
| (select min(t3b) from t3 where t3d = t1d)); | ||
|
|
||
| -- Clean Up | ||
| DROP TABLE t1; | ||
| DROP TABLE t2; | ||
| DROP TABLE t3; | ||
| USE default; | ||
| DROP DATABASE indb; | ||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You created a database, but you did not use it.
Maybe you do not need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statement CREATE DATABASE/DROP DATABASE is used to isolate all the objects created in this test file to its own database/schema. The purpose is to protect any name collision from other test files leaving objects of the same names without properly dropping them. The use of different database/schema mitigates this problem as the chance of having the same database/schema name in two different test files is low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what's missing here is the
USE indbafter theCREATE DATABASEto direct all the objects created after this statement to have a different database/schema. My bad. I thoughtCREATE DATABASEimplicitly switch to a new database/schema.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each test suite, (e.g., in-group-by.sql), we create a dedicated sesssion. See the code.
Why not creating temporary views? These views will be automatically removed after the session ends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no explicit restriction, I would like to keep the create database/use database so that the test file is self-contained and can be run in different environments with minimal side effect.
I don't have any preference between real tables or temporary views but variations are good to exercise different code paths. If all the test cases are written homogeneously to certain patterns, it limits the coverage. Again, if there is no explicit rules or guidelines on which particular ways to write test cases, I would like to request to have it kept at this format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will remove it.