Skip to content
Closed
Changes from 1 commit
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
01e4cdf
Merge remote-tracking branch 'upstream/master'
gatorsmile Nov 13, 2015
6835704
Merge remote-tracking branch 'upstream/master'
gatorsmile Nov 14, 2015
9180687
Merge remote-tracking branch 'upstream/master'
gatorsmile Nov 14, 2015
b38a21e
SPARK-11633
gatorsmile Nov 17, 2015
d2b84af
Merge remote-tracking branch 'upstream/master' into joinMakeCopy
gatorsmile Nov 17, 2015
fda8025
Merge remote-tracking branch 'upstream/master'
gatorspark Nov 17, 2015
ac0dccd
Merge branch 'master' of https://github.com/gatorsmile/spark
gatorspark Nov 17, 2015
6e0018b
Merge remote-tracking branch 'upstream/master'
Nov 20, 2015
0546772
converge
gatorsmile Nov 20, 2015
b37a64f
converge
gatorsmile Nov 20, 2015
c2a872c
Merge remote-tracking branch 'upstream/master'
gatorsmile Jan 6, 2016
ab6dbd7
Merge remote-tracking branch 'upstream/master'
gatorsmile Jan 6, 2016
4276356
Merge remote-tracking branch 'upstream/master'
gatorsmile Jan 6, 2016
2dab708
Merge remote-tracking branch 'upstream/master'
gatorsmile Jan 7, 2016
0458770
Merge remote-tracking branch 'upstream/master'
gatorsmile Jan 8, 2016
1debdfa
Merge remote-tracking branch 'upstream/master'
gatorsmile Jan 9, 2016
763706d
Merge remote-tracking branch 'upstream/master'
gatorsmile Jan 14, 2016
4de6ec1
Merge remote-tracking branch 'upstream/master'
gatorsmile Jan 18, 2016
9422a4f
Merge remote-tracking branch 'upstream/master'
gatorsmile Jan 19, 2016
52bdf48
Merge remote-tracking branch 'upstream/master'
gatorsmile Jan 20, 2016
1e95df3
Merge remote-tracking branch 'upstream/master'
gatorsmile Jan 23, 2016
fab24cf
Merge remote-tracking branch 'upstream/master'
gatorsmile Feb 1, 2016
8b2e33b
Merge remote-tracking branch 'upstream/master'
gatorsmile Feb 5, 2016
2ee1876
Merge remote-tracking branch 'upstream/master'
gatorsmile Feb 11, 2016
b9f0090
Merge remote-tracking branch 'upstream/master'
gatorsmile Feb 12, 2016
ade6f7e
Merge remote-tracking branch 'upstream/master'
gatorsmile Feb 15, 2016
9fd63d2
Merge remote-tracking branch 'upstream/master'
gatorsmile Feb 19, 2016
5199d49
Merge remote-tracking branch 'upstream/master'
gatorsmile Feb 22, 2016
404214c
Merge remote-tracking branch 'upstream/master'
gatorsmile Feb 23, 2016
c001dd9
Merge remote-tracking branch 'upstream/master'
gatorsmile Feb 25, 2016
59daa48
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 5, 2016
41d5f64
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 7, 2016
472a6e3
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 10, 2016
458f7be
ppd for window
gatorsmile Mar 10, 2016
92136dd
Merge remote-tracking branch 'upstream/master' into pushPredicateThro…
gatorsmile Mar 10, 2016
f401d8b
only partitioning key
gatorsmile Mar 10, 2016
0fba10a
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 12, 2016
3aea1da
Merge branch 'pushPredicateThroughWindow' into pushPredicateThroughWi…
gatorsmile Mar 12, 2016
d420246
add windowExpr and windowSpec to DSL
gatorsmile Mar 12, 2016
c05b4ae
remove useless import
gatorsmile Mar 12, 2016
6db1940
added more test cases
gatorsmile Mar 14, 2016
8fa0294
added two more test case.
gatorsmile Mar 21, 2016
cbf73b3
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 21, 2016
c08f561
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 22, 2016
474df88
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 22, 2016
3d9828d
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 24, 2016
72d2361
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 26, 2016
07afea5
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 29, 2016
8bf2007
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 30, 2016
87a165b
Merge remote-tracking branch 'upstream/master'
gatorsmile Mar 31, 2016
b9359cd
Merge remote-tracking branch 'upstream/master'
gatorsmile Apr 1, 2016
b0d7b3b
Merge branch 'pushPredicateThroughWindowNew' into pushPredicateThroug…
gatorsmile Apr 1, 2016
65bd090
Merge remote-tracking branch 'upstream/master'
gatorsmile Apr 5, 2016
babf2da
Merge remote-tracking branch 'upstream/master'
gatorsmile Apr 5, 2016
9e09469
Merge remote-tracking branch 'upstream/master'
gatorsmile Apr 6, 2016
50a8e4a
Merge remote-tracking branch 'upstream/master'
gatorsmile Apr 6, 2016
f3337fa
Merge remote-tracking branch 'upstream/master'
gatorsmile Apr 10, 2016
09cc36d
Merge remote-tracking branch 'upstream/master'
gatorsmile Apr 12, 2016
83a1915
Merge remote-tracking branch 'upstream/master'
gatorsmile Apr 14, 2016
0483145
Merge remote-tracking branch 'upstream/master'
gatorsmile Apr 19, 2016
236a5f4
Merge remote-tracking branch 'upstream/master'
gatorsmile Apr 20, 2016
417bdb4
merge
gatorsmile Apr 20, 2016
cc5f0bd
merge
gatorsmile Apr 20, 2016
436359f
style fix.
gatorsmile Apr 20, 2016
fae2694
address comments.
gatorsmile Apr 20, 2016
875d6b6
style fix.
gatorsmile Apr 20, 2016
0469923
style fix
gatorsmile Apr 20, 2016
5c4f4d3
address comments.
gatorsmile Apr 20, 2016
c4dedd2
address comments and added more test cases.
gatorsmile Apr 20, 2016
3eaeaa5
Merge remote-tracking branch 'upstream/master' into pushPredicateThro…
gatorsmile Apr 25, 2016
e427ce9
address comments.
gatorsmile Apr 25, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
style fix.
  • Loading branch information
gatorsmile committed Apr 20, 2016
commit 875d6b6cdeab3a82f63b4e82fdfc792b54345a40
Original file line number Diff line number Diff line change
Expand Up @@ -963,7 +963,7 @@ object PushDownPredicate extends Rule[LogicalPlan] with PredicateHelper {
// 1. involving one and only one column that is part of window partitioning key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can push any deterministic predicate down as all of its references are part of the partition key. So why limit this?

Copy link
Member Author

@gatorsmile gatorsmile Apr 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, we are unable to push the predicate (key + value) > '2' in the following query. Predicate push down will change the value of sum(key)

select * from (SELECT key, value, sum(key) over(partition by key, value) as c1 from src)r1 where (key + value) > '2';

The example is copied from the test case of Hive. https://issues.apache.org/jira/secure/attachment/12788757/HIVE-12808.05.patch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I am not sure I agree with you here.

Lets take your example. We filter the rows with the following predicate: key + value > 2. We use both key and value in PARTITION BY clause, this means they are constant during window function evaluation. The value of key + value > 1 will (as a result) also be constant during window evaluation. This predicate would filter out entire partitions, so we can safely push it down.

I think we can safely push down any deterministic filter which only references partitioning columns. Let me know what you think.

I am not sure why Hive does not push this down, but this could well be due to the way Hive evaluates window functions (PTFs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. In our window function evaluation, both key and value are constant. It should be safe to push down key + value > 2. Will follow your idea. Thanks!

// 2. Window partitioning key should be just a sequence of [[AttributeReference]].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is almost guaranteed by the analyzer.

Copy link
Member Author

@gatorsmile gatorsmile Apr 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose is to prohibit the push down of predicate (key + value) > '2' when the partition by is key + value in the following example,

select * from (SELECT key, value, sum(key) over(partition by key + value) as c1 from src)r1 where (key + value) > '2'

The example is also copied from the test case of Hive. https://issues.apache.org/jira/secure/attachment/12788757/HIVE-12808.05.patch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this case? It appears like key + value is also constant when evaluating window functions. It should be OK to push it down? This restriction could be also related to Hive implementation?

Copy link
Contributor

@hvanhovell hvanhovell Apr 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile this restriction is probably more related to the fact that if we push down an entire expression, e.g.: a + b, we have to evaluate the expression twice, once in the Filter and once in the Window function. The double evaluation could be avoided by planning a Project.

I am pretty sure that we move all expressions used in Window clauses into an underlying Project during Analysis. So this shouldn't be to big of a problem.

[I updated the comment]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thank you very much! I just added an extra condition to ensure that Analyzer converts all the compound expressions to alias. Added a couple of test cases to ensure it.

To enable predicate push down when the partitioning columns is a + b and the predicate is a + b > 3, we need to add a rule in Analyzer for converting the expressions to the underlying alias that has the exactly same original expressions. This will be submitted in a separate PR.

// 3. deterministic
case filter@Filter(condition, w: Window)
case filter @ Filter(condition, w: Window)
if w.partitionSpec.forall(_.isInstanceOf[AttributeReference]) =>
val partitionAttrs = AttributeSet(w.partitionSpec.flatMap(_.references))
val (pushDown, stayUp) = splitConjunctivePredicates(condition).partition { cond =>
Expand Down