Skip to content

Conversation

@gatorsmile
Copy link
Member

Conversion of outer joins, if the predicates in filter conditions can restrict the result sets so that all null-supplying rows are eliminated.

  • full outer -> inner if both sides have such predicates
  • left outer -> inner if the right side has such predicates
  • right outer -> inner if the left side has such predicates
  • full outer -> left outer if only the left side has such predicates
  • full outer -> right outer if only the right side has such predicates

If applicable, this can greatly improve the performance, since outer join is much slower than inner join, full outer join is much slower than left/right outer join.

gatorsmile and others added 4 commits December 31, 2015 15:48
…sh script

This patch includes multiple fixes for the `dev/test-dependencies.sh` script (which was introduced in apache#10461):

- Use `build/mvn --force` instead of `mvn` in one additional place.
- Explicitly set a zero exit code on success.
- Set `LC_ALL=C` to make `sort` results agree across machines (see https://stackoverflow.com/questions/28881/).
- Set `should_run_build_tests=True` for `build` module (this somehow got lost).

Author: Josh Rosen <[email protected]>

Closes apache#10543 from JoshRosen/dep-script-fixes.
@SparkQA
Copy link

SparkQA commented Jan 1, 2016

Test build #48559 has finished for PR 10542 at commit 90576aa.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 1, 2016

Test build #48564 has finished for PR 10542 at commit 192ab19.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

viirya and others added 3 commits December 31, 2015 23:48
A following pr for apache#9712. Move the test for arrayOfUDT.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#10538 from viirya/move-udt-test.
A slight adjustment to the checker configuration was needed; there is
a handful of warnings still left, but those are because of a bug in
the checker that I'll fix separately (before enabling errors for the
checker, of course).

Author: Marcelo Vanzin <[email protected]>

Closes apache#10535 from vanzin/SPARK-3873-mllib.
… for JDBCRDD and add few filters

This patch refactors the filter pushdown for JDBCRDD and also adds few filters.

Added filters are basically from apache#10468 with some refactoring. Test cases are from apache#10468.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#10470 from viirya/refactor-jdbc-filter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a better name for this? this is a form of strength reduction right? I don't know if there is a better term in the database land. Can you look into postgres source code and see what they call this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. OuterJoinElimination might sound better. Let me rename it today.

Since full outer is union distinct of left outer and right outer, we are removing right outer from full outer when conversion from full outer to left outer.

@gatorsmile gatorsmile changed the title [SPARK-12594] [SQL] Outer Join Conversion: Outer/Right/Left to Inner and Outer to Left/Right [SPARK-12594] [SQL] Outer Join Elimination: Outer/Right/Left to Inner and Outer to Left/Right Jan 1, 2016
liancheng and others added 3 commits January 1, 2016 13:24
There's a hack done in `TestHive.reset()`, which intended to mute noisy Hive loggers. However, Spark testing loggers are also muted.

Author: Cheng Lian <[email protected]>

Closes apache#10540 from liancheng/spark-12592.dont-mute-spark-loggers.
…ut UnsafeRow

It's confusing that some operator output UnsafeRow but some not, easy to make mistake.

This PR change to only output UnsafeRow for all the operators (SparkPlan), removed the rule to insert Unsafe/Safe conversions. For those that can't output UnsafeRow directly, added UnsafeProjection into them.

Closes apache#10330

cc JoshRosen rxin

Author: Davies Liu <[email protected]>

Closes apache#10511 from davies/unsafe_row.
@SparkQA
Copy link

SparkQA commented Jan 1, 2016

Test build #48569 has finished for PR 10542 at commit c04b53b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile changed the title [SPARK-12594] [SQL] Outer Join Elimination: Outer/Right/Left to Inner and Outer to Left/Right [SPARK-12594] [SQL] Outer Join Elimination by Local Predicates Jan 2, 2016
hvanhovell and others added 4 commits January 1, 2016 23:22
This PR inlines the Hive SQL parser in Spark SQL.

The previous (merged) incarnation of this PR passed all tests, but had and still has problems with the build. These problems are caused by a the fact that - for some reason - in some cases the ANTLR generated code is not included in the compilation fase.

This PR is a WIP and should not be merged until we have sorted out the build issues.

Author: Herman van Hovell <[email protected]>
Author: Nong Li <[email protected]>
Author: Nong Li <[email protected]>

Closes apache#10525 from hvanhovell/SPARK-12362.
…ilter

This PR is followed by apache#8391.
Previous PR fixes JDBCRDD to support null-safe equality comparison for JDBC datasource. This PR fixes the problem that it can actually return null as a result of the comparison resulting error as using the value of that comparison.

Author: hyukjinkwon <[email protected]>
Author: HyukjinKwon <[email protected]>

Closes apache#8743 from HyukjinKwon/SPARK-10180.
… APIs and reflection that supported 1.x

Remove use of deprecated Hadoop APIs now that 2.2+ is required

Author: Sean Owen <[email protected]>

Closes apache#10446 from srowen/SPARK-12481.
@SparkQA
Copy link

SparkQA commented Jan 3, 2016

Test build #48597 has started for PR 10542 at commit 65f9125.

rxin and others added 4 commits January 2, 2016 22:31
callUDF has been deprecated. However, we do not have an alternative for users to specify the output data type without type tags. This pull request introduced a new API for that, and replaces the invocation of the deprecated callUDF with that.

Author: Reynold Xin <[email protected]>

Closes apache#10547 from rxin/SPARK-12599.
…SQL] always output UnsafeRow""

This reverts commit 44ee920.
@SparkQA
Copy link

SparkQA commented Jan 3, 2016

Test build #48600 has finished for PR 10542 at commit 0bb07cb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile changed the title [SPARK-12594] [SQL] Outer Join Elimination by Local Predicates [SPARK-12594] [SQL] Outer Join Elimination by Filter Conditions Jan 3, 2016
@SparkQA
Copy link

SparkQA commented Jan 3, 2016

Test build #48604 has finished for PR 10542 at commit c5ff632.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 3, 2016

Test build #48609 has finished for PR 10542 at commit ee29dd2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Avoiding the the No such table exception and throwing analysis exception as per the bug: SPARK-12533

Author: thomastechs <[email protected]>

Closes apache#10529 from thomastechs/topic-branch.
@gatorsmile
Copy link
Member Author

The failed bucket is not caused by the code changes in this PR. The code change is from the PR: #10275

@SparkQA
Copy link

SparkQA commented Jan 3, 2016

Test build #48613 has finished for PR 10542 at commit ee29dd2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Jan 3, 2016

Created a fix in #10564 , seems that the lag between the test being written and it getting merged had some bits change underneath it.

@gatorsmile
Copy link
Member Author

Thank you! @holdenk

@holdenk
Copy link
Contributor

holdenk commented Jan 3, 2016

Sure thing - thanks for pinging me when you noticed the issue :)

rxin and others added 3 commits January 3, 2016 16:58
Author: Reynold Xin <[email protected]>

Closes apache#10561 from rxin/update-mima.
…slash quoting mechanism

We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not.

Author: Cazen <[email protected]>
Author: Cazen Lee <[email protected]>
Author: Cazen Lee <[email protected]>
Author: cazen.lee <[email protected]>

Closes apache#10497 from Cazen/master.
Previously (when the PR was first created) not specifying b= explicitly was fine (and treated as default null) - instead be explicit about b being None in the test.

Author: Holden Karau <[email protected]>

Closes apache#10564 from holdenk/SPARK-12611-fix-test-infer-schema-local.
@gatorsmile
Copy link
Member Author

Let me close it and then resubmit the PR.

@SparkQA
Copy link

SparkQA commented Jan 4, 2016

Test build #48628 has finished for PR 10542 at commit ee29dd2.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 4, 2016

Test build #48632 has finished for PR 10542 at commit 63d5d62.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Feb 20, 2016
Conversion of outer joins, if the predicates in filter conditions can restrict the result sets so that all null-supplying rows are eliminated.

- `full outer` -> `inner` if both sides have such predicates
- `left outer` -> `inner` if the right side has such predicates
- `right outer` -> `inner` if the left side has such predicates
- `full outer` -> `left outer` if only the left side has such predicates
- `full outer` -> `right outer` if only the right side has such predicates

If applicable, this can greatly improve the performance, since outer join is much slower than inner join, full outer join is much slower than left/right outer join.

The original PR is #10542

Author: gatorsmile <[email protected]>
Author: xiaoli <[email protected]>
Author: Xiao Li <[email protected]>

Closes #10567 from gatorsmile/outerJoinEliminationByFilterCond.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.