-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12594] [SQL] Outer Join Elimination by Filter Conditions #10542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sh script This patch includes multiple fixes for the `dev/test-dependencies.sh` script (which was introduced in apache#10461): - Use `build/mvn --force` instead of `mvn` in one additional place. - Explicitly set a zero exit code on success. - Set `LC_ALL=C` to make `sort` results agree across machines (see https://stackoverflow.com/questions/28881/). - Set `should_run_build_tests=True` for `build` module (this somehow got lost). Author: Josh Rosen <[email protected]> Closes apache#10543 from JoshRosen/dep-script-fixes.
|
Test build #48559 has finished for PR 10542 at commit
|
|
Test build #48564 has finished for PR 10542 at commit
|
A following pr for apache#9712. Move the test for arrayOfUDT. Author: Liang-Chi Hsieh <[email protected]> Closes apache#10538 from viirya/move-udt-test.
A slight adjustment to the checker configuration was needed; there is a handful of warnings still left, but those are because of a bug in the checker that I'll fix separately (before enabling errors for the checker, of course). Author: Marcelo Vanzin <[email protected]> Closes apache#10535 from vanzin/SPARK-3873-mllib.
… for JDBCRDD and add few filters This patch refactors the filter pushdown for JDBCRDD and also adds few filters. Added filters are basically from apache#10468 with some refactoring. Test cases are from apache#10468. Author: Liang-Chi Hsieh <[email protected]> Closes apache#10470 from viirya/refactor-jdbc-filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a better name for this? this is a form of strength reduction right? I don't know if there is a better term in the database land. Can you look into postgres source code and see what they call this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. OuterJoinElimination might sound better. Let me rename it today.
Since full outer is union distinct of left outer and right outer, we are removing right outer from full outer when conversion from full outer to left outer.
There's a hack done in `TestHive.reset()`, which intended to mute noisy Hive loggers. However, Spark testing loggers are also muted. Author: Cheng Lian <[email protected]> Closes apache#10540 from liancheng/spark-12592.dont-mute-spark-loggers.
…ut UnsafeRow It's confusing that some operator output UnsafeRow but some not, easy to make mistake. This PR change to only output UnsafeRow for all the operators (SparkPlan), removed the rule to insert Unsafe/Safe conversions. For those that can't output UnsafeRow directly, added UnsafeProjection into them. Closes apache#10330 cc JoshRosen rxin Author: Davies Liu <[email protected]> Closes apache#10511 from davies/unsafe_row.
|
Test build #48569 has finished for PR 10542 at commit
|
…ays output UnsafeRow" This reverts commit 0da7bd5.
This PR inlines the Hive SQL parser in Spark SQL. The previous (merged) incarnation of this PR passed all tests, but had and still has problems with the build. These problems are caused by a the fact that - for some reason - in some cases the ANTLR generated code is not included in the compilation fase. This PR is a WIP and should not be merged until we have sorted out the build issues. Author: Herman van Hovell <[email protected]> Author: Nong Li <[email protected]> Author: Nong Li <[email protected]> Closes apache#10525 from hvanhovell/SPARK-12362.
…ilter This PR is followed by apache#8391. Previous PR fixes JDBCRDD to support null-safe equality comparison for JDBC datasource. This PR fixes the problem that it can actually return null as a result of the comparison resulting error as using the value of that comparison. Author: hyukjinkwon <[email protected]> Author: HyukjinKwon <[email protected]> Closes apache#8743 from HyukjinKwon/SPARK-10180.
… APIs and reflection that supported 1.x Remove use of deprecated Hadoop APIs now that 2.2+ is required Author: Sean Owen <[email protected]> Closes apache#10446 from srowen/SPARK-12481.
|
Test build #48597 has started for PR 10542 at commit |
callUDF has been deprecated. However, we do not have an alternative for users to specify the output data type without type tags. This pull request introduced a new API for that, and replaces the invocation of the deprecated callUDF with that. Author: Reynold Xin <[email protected]> Closes apache#10547 from rxin/SPARK-12599.
…SQL] always output UnsafeRow"" This reverts commit 44ee920.
|
Test build #48600 has finished for PR 10542 at commit
|
|
Test build #48604 has finished for PR 10542 at commit
|
shivaram Author: felixcheung <[email protected]> Closes apache#10408 from felixcheung/rcodecomment.
|
Test build #48609 has finished for PR 10542 at commit
|
Avoiding the the No such table exception and throwing analysis exception as per the bug: SPARK-12533 Author: thomastechs <[email protected]> Closes apache#10529 from thomastechs/topic-branch.
|
The failed bucket is not caused by the code changes in this PR. The code change is from the PR: #10275 |
|
Test build #48613 has finished for PR 10542 at commit
|
|
Created a fix in #10564 , seems that the lag between the test being written and it getting merged had some bits change underneath it. |
|
Thank you! @holdenk |
|
Sure thing - thanks for pinging me when you noticed the issue :) |
Author: Reynold Xin <[email protected]> Closes apache#10561 from rxin/update-mima.
…slash quoting mechanism We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not. Author: Cazen <[email protected]> Author: Cazen Lee <[email protected]> Author: Cazen Lee <[email protected]> Author: cazen.lee <[email protected]> Closes apache#10497 from Cazen/master.
Previously (when the PR was first created) not specifying b= explicitly was fine (and treated as default null) - instead be explicit about b being None in the test. Author: Holden Karau <[email protected]> Closes apache#10564 from holdenk/SPARK-12611-fix-test-infer-schema-local.
…oinElimination # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
|
Let me close it and then resubmit the PR. |
|
Test build #48628 has finished for PR 10542 at commit
|
|
Test build #48632 has finished for PR 10542 at commit
|
Conversion of outer joins, if the predicates in filter conditions can restrict the result sets so that all null-supplying rows are eliminated. - `full outer` -> `inner` if both sides have such predicates - `left outer` -> `inner` if the right side has such predicates - `right outer` -> `inner` if the left side has such predicates - `full outer` -> `left outer` if only the left side has such predicates - `full outer` -> `right outer` if only the right side has such predicates If applicable, this can greatly improve the performance, since outer join is much slower than inner join, full outer join is much slower than left/right outer join. The original PR is #10542 Author: gatorsmile <[email protected]> Author: xiaoli <[email protected]> Author: Xiao Li <[email protected]> Closes #10567 from gatorsmile/outerJoinEliminationByFilterCond.
Conversion of outer joins, if the predicates in filter conditions can restrict the result sets so that all null-supplying rows are eliminated.
full outer->innerif both sides have such predicatesleft outer->innerif the right side has such predicatesright outer->innerif the left side has such predicatesfull outer->left outerif only the left side has such predicatesfull outer->right outerif only the right side has such predicatesIf applicable, this can greatly improve the performance, since outer join is much slower than inner join, full outer join is much slower than left/right outer join.