Skip to content
Closed
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
3b44c59
adding testcase
kevinyu98 Apr 20, 2016
18b4a31
Merge remote-tracking branch 'upstream/master'
kevinyu98 Apr 22, 2016
4f4d1c8
Merge remote-tracking branch 'upstream/master'
kevinyu98 Apr 23, 2016
f5f0cbe
Merge remote-tracking branch 'upstream/master'
kevinyu98 Apr 23, 2016
d8b2edb
Merge remote-tracking branch 'upstream/master'
kevinyu98 Apr 25, 2016
196b6c6
Merge remote-tracking branch 'upstream/master'
kevinyu98 Apr 25, 2016
f37a01e
Merge remote-tracking branch 'upstream/master'
kevinyu98 Apr 27, 2016
bb5b01f
Merge remote-tracking branch 'upstream/master'
kevinyu98 Apr 30, 2016
bde5820
Merge remote-tracking branch 'upstream/master'
kevinyu98 May 4, 2016
5f7cd96
Merge remote-tracking branch 'upstream/master'
kevinyu98 May 10, 2016
893a49a
Merge remote-tracking branch 'upstream/master'
kevinyu98 May 13, 2016
4bbe1fd
Merge remote-tracking branch 'upstream/master'
kevinyu98 May 17, 2016
b2dd795
Merge remote-tracking branch 'upstream/master'
kevinyu98 May 18, 2016
8c3e5da
Merge remote-tracking branch 'upstream/master'
kevinyu98 May 18, 2016
a0eaa40
Merge remote-tracking branch 'upstream/master'
kevinyu98 May 19, 2016
d03c940
Merge remote-tracking branch 'upstream/master'
kevinyu98 May 19, 2016
d728d5e
Merge remote-tracking branch 'upstream/master'
kevinyu98 May 24, 2016
ea104dd
Merge remote-tracking branch 'upstream/master'
kevinyu98 May 25, 2016
6ab1215
Merge remote-tracking branch 'upstream/master'
kevinyu98 May 27, 2016
0c56653
Merge remote-tracking branch 'upstream/master'
kevinyu98 Jun 1, 2016
d7a1874
Merge remote-tracking branch 'upstream/master'
kevinyu98 Jun 1, 2016
85d3500
Merge remote-tracking branch 'upstream/master'
kevinyu98 Jun 2, 2016
c056f91
Merge remote-tracking branch 'upstream/master'
kevinyu98 Jun 3, 2016
0b8189d
Merge remote-tracking branch 'upstream/master'
kevinyu98 Jun 3, 2016
c2ea31d
Merge remote-tracking branch 'upstream/master'
kevinyu98 Jun 6, 2016
a2d3056
Merge remote-tracking branch 'upstream/master'
kevinyu98 Jun 8, 2016
39e5648
Merge remote-tracking branch 'upstream/master'
kevinyu98 Jun 8, 2016
b9370a3
Merge remote-tracking branch 'upstream/master'
kevinyu98 Jul 25, 2016
01224a4
Merge remote-tracking branch 'upstream/master'
kevinyu98 Aug 3, 2016
d05d39a
Merge remote-tracking branch 'upstream/master'
kevinyu98 Aug 19, 2016
ee6ed88
Merge remote-tracking branch 'upstream/master'
kevinyu98 Aug 26, 2016
db19296
Merge remote-tracking branch 'upstream/master'
kevinyu98 Aug 31, 2016
2e399d9
Merge remote-tracking branch 'upstream/master'
kevinyu98 Sep 1, 2016
0ef59bc
Merge remote-tracking branch 'upstream/master'
kevinyu98 Sep 30, 2016
6fad85f
Merge remote-tracking branch 'upstream/master'
kevinyu98 Oct 20, 2016
5525dff
Merge remote-tracking branch 'upstream/master'
kevinyu98 Nov 4, 2016
63715e4
Merge remote-tracking branch 'upstream/master'
kevinyu98 Nov 22, 2016
a084410
Merge remote-tracking branch 'upstream/master'
kevinyu98 Dec 9, 2016
cdfb1ad
first draft testcases
kevinyu98 Dec 9, 2016
753d7fe
in subquery testcase
kevinyu98 Dec 18, 2016
b6e5b97
Merge remote-tracking branch 'upstream/master'
kevinyu98 Dec 18, 2016
7ca65ec
Merge branch 'subquery' into spark-18871
kevinyu98 Dec 19, 2016
7165105
rerun the test
kevinyu98 Dec 19, 2016
32dbd46
rename the file name
kevinyu98 Dec 19, 2016
ee1e14e
fix comment
kevinyu98 Dec 19, 2016
c2ca009
rename the file
kevinyu98 Dec 19, 2016
9c584fb
logical grouping the IN subquery
kevinyu98 Dec 21, 2016
1c1900a
comment out 4 test cases
kevinyu98 Dec 22, 2016
7c129d9
address comments
kevinyu98 Jan 1, 2017
895bb35
remove two not in cases
kevinyu98 Jan 2, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
-- A test suite for GROUP BY in parent side, subquery, and both predicate subquery
-- It includes correlated cases.

-- tables and data types

CREATE DATABASE indb;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You created a database, but you did not use it.

Maybe you do not need it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement CREATE DATABASE/DROP DATABASE is used to isolate all the objects created in this test file to its own database/schema. The purpose is to protect any name collision from other test files leaving objects of the same names without properly dropping them. The use of different database/schema mitigates this problem as the chance of having the same database/schema name in two different test files is low.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what's missing here is the USE indb after the CREATE DATABASE to direct all the objects created after this statement to have a different database/schema. My bad. I thought CREATE DATABASE implicitly switch to a new database/schema.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each test suite, (e.g., in-group-by.sql), we create a dedicated sesssion. See the code.

Why not creating temporary views? These views will be automatically removed after the session ends.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no explicit restriction, I would like to keep the create database/use database so that the test file is self-contained and can be run in different environments with minimal side effect.

I don't have any preference between real tables or temporary views but variations are good to exercise different code paths. If all the test cases are written homogeneously to certain patterns, it limits the coverage. Again, if there is no explicit rules or guidelines on which particular ways to write test cases, I would like to request to have it kept at this format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will remove it.

CREATE TABLE t1(t1a String, t1b Short, t1c Int, t1d Long, t1e float, t1f double, t1g DECIMAL, t1h TIMESTAMP, t1i Date)
using parquet;
CREATE TABLE t2(t2a String, t2b Short, t2c Int, t2d Long, t2e float, t2f double, t2g DECIMAL, t2h TIMESTAMP, t2i Date)
using parquet;
CREATE TABLE t3(t3a String, t3b Short, t3c Int, t3d Long, t3e float, t3f double, t3g DECIMAL, t3h TIMESTAMP, t3i Date)
using parquet;

-- insert to tables
INSERT INTO t1 VALUES
('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")),
('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1a', 16, 12, 21, 15, 20, 20.00, timestamp(date("2014-06-04")), date("2014-06-04")),
('t1a', 16, 12, 10, 15, 20, 20.00, timestamp(date("2014-07-04")), date("2014-07-04")),
('t1c', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-05")),
('t1d', null, 16, 22, 17, 25, 26.00, timestamp(date("2014-06-04")), null),
('t1d', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), null),
('t1e', 10, null, 25, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-04")),
('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-04")),
('t1d', 10, null, 12, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04")),
('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")),
('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-0=4"));

INSERT INTO t2 VALUES
('t2a', 6, 12, 14, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")),
('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1b', 8, 16, 119, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04")),
('t1c', 12, 16, 219, 17, 25, 26.00, timestamp(date("2016-05-04")), date("2016-05-04")),
('t1b', null, 16, 319, 17, 25, 26.00, timestamp(date("2017-05-04")), null),
('t2e', 8, null, 419, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")),
('t1f', 19, null, 519, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")),
('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), date("2014-07-04")),
('t1c', 12, 16, 19, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-05")),
('t1e', 8, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-04")),
('t1f', 19, null, 19, 17, 25, 26.00, timestamp(date("2014-10-04")), date("2014-10-04")),
('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), null);

INSERT INTO t3 VALUES
('t3a', 6, 12, 110, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")),
('t3a', 6, 12, 10, 15, 20, 20.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1b', 10, 12, 219, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1b', 8, 16, 319, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")),
('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), date("2014-07-04")),
('t3c', 17, 16, 519, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-04")),
('t3c', 17, 16, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-05")),
('t1b', null, 16, 419, 17, 25, 26.00, timestamp(date("2014-10-04")), null),
('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-11-04")), null),
('t3b', 8, null, 719, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t3b', 8, null, 19, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04"));

-- correlated IN subquery
-- GROUP BY in parent side
-- TC 01.01
select t1a, avg(t1b) from t1 where t1a in (select t2a from t2) group by t1a;
-- TC 01.02
select t1a, max(t1b) from t1 where t1b in (select t2b from t2 where t1a = t2a) group by t1a, t1d;
-- TC 01.03
select t1a, t1b from t1 where t1c in (select t2c from t2 where t1a = t2a) group by t1a, t1b;
-- TC 01.04
select t1a, sum(distinct(t1b)) from t1 where t1c in (select t2c from t2 where t1a = t2a) or
t1c in (select t3c from t3 where t1a = t3a) group by t1a, t1c;
-- TC 01.05
select t1a, sum(distinct(t1b)) from t1 where t1c in (select t2c from t2 where t1a = t2a) and
t1c in (select t3c from t3 where t1a = t3a) group by t1a, t1c;
-- TC 01.06
select t1a, count(distinct(t1b)) from t1 where t1c in (select t2c from t2 where t1a = t2a)
group by t1a, t1c having t1a = "t1b";

-- GROUP BY in subquery
-- TC 01.07
select * from t1 where t1b in (select max(t2b) from t2 group by t2a);
-- TC 01.08
select * from (select t2a, t2b from t2 where t2a in (select t1a from t1 where t1b = t2b) group by t2a, t2b) t2;
-- TC 01.09
select count(distinct(*)) from t1 where t1b in (select min(t2b) from t2 where t1a = t2a and t1c = t2c group by t2a);
-- TC 01.10
select t1a, t1b from t1 where t1c in (select max(t2c) from t2 where t1a = t2a group by t2a, t2c having t2c > 8);
-- TC 01.11
select t1a, t1b from t1 where t1c in (select t2c from t2 where t2a in
(select min(t3a) from t3 where t3a = t2a group by t3b) group by t2c);
-- TC 01.12, comment out pending SPARK-18863
--select * from t1 where t1a in
--(select min(t2a) from t2 where t2a = t2a and t2c >= 1 group by t2c having t2c in
--(select t3c from t3 group by t3c, t3b having t2b > 6 and t3b > t2b ));
-- TC 01.13, comment out pending SPARK-18863
--select * from (select * from t2 where t2a in (select t1a from t1 where t1b = t2b)) t2 where t2a in
--(select t2a from t2 where t2a = t2a and t2c > 1 group by t2a having t2c > 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test suite is only for positive cases, right? The above two test cases are negative cases. The negative cases can be added with the other negative cases. Right? We can put these two cases in the JIRA-18863.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will enable these two test cases once the problem is fixed. They remain here to demonstrate the test coverage of IN/NOT IN subquery. At this point, I have only done an initial investigation on SPARK-18863. Once the investigation is confirmed that these two cases have the same root cause as the one in 18863, they will be cross-referenced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two test cases will not be part of this test suite, right? So far, no negative cases are added to this test suite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, so we can add these test cases back when deliver the fixes as you mentioned below? I will put these test case into the JIRA-18863. Thanks.


-- GROUP BY in both
-- TC 01.14
select t1a, min(t1b) from t1 where t1c in (select min(t2c) from t2 where t2b = t1b group by t2a) group by t1a;
-- TC 01.15
select t1a, min(t1b) from t1 where t1c in (select min(t2c) from t2 where t2b in (select min(t3b) from t3
where t2a = t3a group by t3a) group by t2c) group by t1a, t1d;
-- TC 01.16
select t1a, min(t1b) from t1 where t1c in (select min(t2c) from t2 where t2b = t1b group by t2a) and
t1d in (select t3d from t3 where t1c = t3c group by t3d) group by t1a;
-- TC 01.17
select t1a, min(t1b) from t1 where t1c in (select min(t2c) from t2 where t2b = t1b group by t2a) or
t1d in (select t3d from t3 where t1c = t3c group by t3d) group by t1a;
-- TC 01.18
select t1a, min(t1b) from t1 where t1c in (select min(t2c) from t2 where t2b = t1b group by t2a having t2a > t1a) or
t1d in (select t3d from t3 where t1c = t3c group by t3d having t3d = t1d) group by t1a having min(t1b) is NOT NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability is not good for such complex queries. Could you re-format all the cases by using the tool like http://www.dpriver.com/pp/sqlformat.htm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.


-- Clean Up
DROP TABLE t1;
DROP TABLE t2;
DROP TABLE t3;
USE default;
DROP DATABASE indb;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one more empty line here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
-- A test suite for simple IN predicate subquery
-- It includes correlated cases.

-- tables and data types

CREATE DATABASE indb;
CREATE TABLE t1(t1a String, t1b Short, t1c Int, t1d Long, t1e float, t1f double, t1g DECIMAL, t1h TIMESTAMP, t1i Date)
using parquet;
CREATE TABLE t2(t2a String, t2b Short, t2c Int, t2d Long, t2e float, t2f double, t2g DECIMAL, t2h TIMESTAMP, t2i Date)
using parquet;
CREATE TABLE t3(t3a String, t3b Short, t3c Int, t3d Long, t3e float, t3f double, t3g DECIMAL, t3h TIMESTAMP, t3i Date)
using parquet;

-- insert to tables
INSERT INTO t1 VALUES
('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")),
('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1a', 16, 12, 21, 15, 20, 20.00, timestamp(date("2014-06-04")), date("2014-06-04")),
('t1a', 16, 12, 10, 15, 20, 20.00, timestamp(date("2014-07-04")), date("2014-07-04")),
('t1c', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-05")),
('t1d', null, 16, 22, 17, 25, 26.00, timestamp(date("2014-06-04")), null),
('t1d', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), null),
('t1e', 10, null, 25, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-04")),
('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-04")),
('t1d', 10, null, 12, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04")),
('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")),
('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-0=4"));

INSERT INTO t2 VALUES
('t2a', 6, 12, 14, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")),
('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1b', 8, 16, 119, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04")),
('t1c', 12, 16, 219, 17, 25, 26.00, timestamp(date("2016-05-04")), date("2016-05-04")),
('t1b', null, 16, 319, 17, 25, 26.00, timestamp(date("2017-05-04")), null),
('t2e', 8, null, 419, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")),
('t1f', 19, null, 519, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")),
('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), date("2014-07-04")),
('t1c', 12, 16, 19, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-05")),
('t1e', 8, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-04")),
('t1f', 19, null, 19, 17, 25, 26.00, timestamp(date("2014-10-04")), date("2014-10-04")),
('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), null);

INSERT INTO t3 VALUES
('t3a', 6, 12, 110, 15, 20, 20.00, timestamp(date("2014-04-04")), date("2014-04-04")),
('t3a', 6, 12, 10, 15, 20, 20.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1b', 10, 12, 219, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t1b', 8, 16, 319, 17, 25, 26.00, timestamp(date("2014-06-04")), date("2014-06-04")),
('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), date("2014-07-04")),
('t3c', 17, 16, 519, 17, 25, 26.00, timestamp(date("2014-08-04")), date("2014-08-04")),
('t3c', 17, 16, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), date("2014-09-05")),
('t1b', null, 16, 419, 17, 25, 26.00, timestamp(date("2014-10-04")), null),
('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-11-04")), null),
('t3b', 8, null, 719, 17, 25, 26.00, timestamp(date("2014-05-04")), date("2014-05-04")),
('t3b', 8, null, 19, 17, 25, 26.00, timestamp(date("2015-05-04")), date("2015-05-04"));

-- correlated IN subquery
-- simple select
-- TC 01.01
select * from t1 where t1a in (select t2a from t2);
-- TC 01.02
select * from t1 where t1b in (select t2b from t2 where t1a = t2a);
-- TC 01.03
select t1a, t1b from t1 where t1c in (select t2b from t2 where t1a != t2a);
-- TC 01.04
select t1a, t1b from t1 where t1c in (select t2b from t2 where t1a = t2a or t1b > t2b);
-- TC 01.05
select t1a, t1b from t1 where t1c in (select t2b from t2 where t2i in (select t3i from t3 where t2c = t3c));
-- TC 01.06
select t1a, t1b from t1 where t1c in (select t2b from t2 where t2a in
(select t3a from t3 where t2c = t3c and t2b is not NULL));
-- simple select for NOT IN
-- TC 01.07
select distinct(t1a), t1b, t1h from t1 where t1a not in (select t2a from t2);
-- TC 01.08, comment out pending on SPARK-18966
--select t1d, t1h, t1i from t1 where t1d not in (select t2d from t2 where t2h > t1h or t2i > t1i);
-- TC 01.09
select distinct(t1a), t1b from t1 where t1b not in (select t2b from t2 where t1a < t2a and t2b > 8);
-- TC 01.10, comment out pending on SPARK-18966
--select t1a, t1b from t1 where t1c not in (select t2b from t2 where t2a not in
-- (select t3a from t3 where t2c = t3c and t2b is NULL));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add the test cases now. We can deliver the cases with the bug fixes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add these two cases into the JIRA. I think that is good enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-18966 has a simplified test case to demonstrate the problem. The complexity of these two test cases makes it difficult for us to walk through the root cause of the problem. We should enable the test cases here once we fix SPARK-18966 to demonstrate the test coverage of NOT IN in with correlated expressions. I can add a remark on SPARK-18966 to enable these test cases as part of its PR. Once this PR is reviewed and merged, I will have the file and the test cases' numbers recorded in SPARK-18966.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my experience, we do not add them until the fix is ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your recommendation to move this forward? Wouldn't commenting out the failing test cases be sufficient and at the same time, leave a trace that we had thought about these scenarios while we were writing tests for IN subquery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, remove all the unnecessary changes. If any issue/bug exists, create a JIRA to track it.

-- TC 01.11
select t1a, t1b from t1 where t1h not in (select t2h from t2 where t2a = t1a) and t1b not in (
(select min(t3b) from t3 where t3d = t1d));

-- Clean Up
DROP TABLE t1;
DROP TABLE t2;
DROP TABLE t3;
USE default;
DROP DATABASE indb;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one more empty line here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Loading