Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented Jun 25, 2018

There is a narrow race in this code that is caused when the code being
run in assertSpilled / assertNotSpilled runs more than a single job.

SpillListener assumed that only a single job was run, and so would only
block waiting for that single job to finish when numSpilledStages was
called. But some tests (like SQL tests that call checkAnswer) run more
than one job, and so that wait was basically a no-op.

This could cause the next test to install a listener to receive events
from the previous job. Which could cause test failures in certain cases.

The change fixes that race, and also uninstalls listeners after the
test runs, so they don't accumulate when the SparkContext is shared
among multiple tests.

…stener.

There is a narrow race in this code that is caused when the code being
run in assertSpilled / assertNotSpilled runs more than a single job.

SpillListener assumed that only a single job was run, and so would only
block waiting for that single job to finish when `numSpilledStages` was
called. But some tests (like SQL tests that call `checkAnswer`) run more
than one job, and so that wait was basically a no-op.

This could cause the next test to install a listener to receive events
from the previous job. Which could cause test failures in certain cases.

The change fixes that race, and also uninstalls listeners after the
test runs, so they don't accumulate when the SparkContext is shared
among multiple tests.
@SparkQA
Copy link

SparkQA commented Jun 26, 2018

Test build #92312 has finished for PR 21639 at commit 9f0a9c4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

Seems the JIRA number is not related?

body
assert(spillListener.numSpilledStages > 0, s"expected $identifier to spill, but did not")
withListener(sc, new SpillListener) { listener =>
val ret = body
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something obvious, but why shall we need the return value here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the return type in the closure, but the method itself returns Unit, so all that can be cleaned up.

@vanzin vanzin changed the title [SPARK-24631][tests] Avoid cross-job pollution in TestUtils / SpillListener. [SSPARK-24653][tests] Avoid cross-job pollution in TestUtils / SpillListener. Jun 26, 2018
@vanzin
Copy link
Contributor Author

vanzin commented Jun 26, 2018

Oops, no idea how I got the wrong bug.

@vanzin vanzin changed the title [SSPARK-24653][tests] Avoid cross-job pollution in TestUtils / SpillListener. [SPARK-24653][tests] Avoid cross-job pollution in TestUtils / SpillListener. Jun 26, 2018
@SparkQA
Copy link

SparkQA commented Jun 26, 2018

Test build #92345 has finished for PR 21639 at commit 18d5ebf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Jul 2, 2018

lgtm

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93063 has finished for PR 21639 at commit 18d5ebf.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93091 has finished for PR 21639 at commit 18d5ebf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Jul 17, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jul 17, 2018

Test build #93185 has started for PR 21639 at commit 18d5ebf.

* this method will wait until all events posted to the listener bus are processed, and then
* remove the listener from the bus.
*/
def withListener[L <: SparkListener](sc: SparkContext, listener: L) (body: L => Unit): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private? hardly matters.

@SparkQA
Copy link

SparkQA commented Aug 1, 2018

Test build #4226 has finished for PR 21639 at commit 18d5ebf.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Aug 1, 2018

Test build #93861 has finished for PR 21639 at commit 18d5ebf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master

@asfgit asfgit closed this in 1122754 Aug 1, 2018
@vanzin vanzin deleted the SPARK-24653 branch August 24, 2018 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants