-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25212][SQL] Support Filter in ConvertToLocalRelation #22205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
2a1cd27
421ee20
d7e49e7
ba6d91e
326e5d7
4263bd2
f84d256
c721895
f8536e3
af6a91e
c613c6b
ee1c0e8
77fb55e
b00824c
ee6cb6c
c129176
368b42f
0378b1f
d5a953a
3598483
397fa62
dcd001b
b23538b
f769a94
68c41ff
dad6a7f
cb067c3
d552cc1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1349,6 +1349,12 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { | |
|
|
||
| case Limit(IntegerLiteral(limit), LocalRelation(output, data, isStreaming)) => | ||
| LocalRelation(output, data.take(limit), isStreaming) | ||
|
|
||
| case Filter(condition, LocalRelation(output, data, isStreaming)) | ||
| if !hasUnevaluableExpr(condition) => | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the condition is non-deterministic, the values will be always the same after the plans are optimized. The DataFrame with non-deterministic filters will always return the same result.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is OK, right?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We blocks all these optimization in the other optimization rules, e.g., ConstantFolding. All the non-deterministic expressions are not foldable.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ConvertToLocalRelation was already doing this for Project so I assumed it's OK. what I did is exactly how Project(LocalRelation) works.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fine about introducing this change since we already did it in Project(LocalRelation).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose it is fine in this case. The only thing is that it violates the contract of the optimizer: it should not change the results of a query. |
||
| val predicate = InterpretedPredicate.create(condition, output) | ||
| predicate.initialize(0) | ||
| LocalRelation(output, data.filter(row => predicate.eval(row)), isStreaming) | ||
| } | ||
|
|
||
| private def hasUnevaluableExpr(expr: Expression): Boolean = { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: comment in https://github.com/apache/spark/pull/22205/files#diff-a636a87d8843eeccca90140be91d4fafR1348 not change.