Skip to content

Conversation

@cloud-fan
Copy link
Contributor

When we resolve the join operator, we may change the output of right side if self-join is detected. So in Dataset.joinWith, we should resolve the join operator first, and then get the left output and right output from it, instead of using left.output and right.output directly.

@cloud-fan
Copy link
Contributor Author

cc @marmbrus @rxin @gatorsmile

@SparkQA
Copy link

SparkQA commented Nov 18, 2015

Test build #46210 has finished for PR 9806 at commit 2c9f485.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Your fix looks pretty clean to me. Let me share my test cases this PR failed.

 test("joinWith tuple - self join 1") {
   val ds = Seq(("a", 1), ("b", 2)).toDS()
   ds.joinWith(ds, $"_2" === $"_2").collect()
 }

  test("joinWith tuple - self join 2") {
    val ds1 = Seq(("a", 1), ("b", 2)).toDS()
    val ds2 = Seq(("a", 1), ("b", 2)).toDS().as("a")
    ds1.joinWith(ds2, $"_2" === $"a._2").collect()
  }

Do you want me to send a separate PR based on your changes if you are busy? Or you will fix them in this PR? Either is fine to me. : )

Thank you!

@marmbrus
Copy link
Contributor

LGTM, merging to master and 1.6.

@gatorsmile please open JIRAs targeted at 1.6.0 for the bugs you have found. (also use checkAnswer when writing test cases). Thanks!

asfgit pushed a commit that referenced this pull request Nov 18, 2015
When we resolve the join operator, we may change the output of right side if self-join is detected. So in `Dataset.joinWith`, we should resolve the join operator first, and then get the left output and right output from it, instead of using `left.output` and `right.output` directly.

Author: Wenchen Fan <[email protected]>

Closes #9806 from cloud-fan/self-join.

(cherry picked from commit cffb899)
Signed-off-by: Michael Armbrust <[email protected]>
@asfgit asfgit closed this in cffb899 Nov 18, 2015
@gatorsmile
Copy link
Member

Sure. Will do. Thanks!

@cloud-fan cloud-fan deleted the self-join branch November 19, 2015 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants