[SPARK-28247][SS][TEST]Fix flaky test "query without test harness" on ContinuousSuite #32316

zsxwing · 2021-04-23T23:55:19Z

What changes were proposed in this pull request?

This is another attempt to fix the flaky test "query without test harness" on ContinuousSuite.

query without test harness is flaky because it starts a continuous query with two partitions but assumes they will run at the same speed.

In this test, 0 and 2 will be written to partition 0, 1 and 3 will be written to partition 1. It assumes when we see 3, 2 should be written to the memory sink. But this is not guaranteed. We can add if (currentValue == 2) Thread.sleep(5000) at this line

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala

Line 135 in b2a2b5d

to reproduce the failure: Result set Set([0], [1], [3]) are not a superset of Set(0, 1, 2, 3)!

The fix is changing waitForRateSourceCommittedValue to wait until all partitions reach the desired values before stopping the query.

Why are the changes needed?

Fix a flaky test.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests. Manually verify the reproduction I mentioned above doesn't fail after this change.

zsxwing · 2021-04-23T23:57:35Z

cc @jose-torres

SparkQA · 2021-04-24T00:52:56Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42402/

SparkQA · 2021-04-24T04:26:51Z

Test build #137872 has finished for PR 32316 at commit 4c053ae.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2021-04-24T17:36:12Z

It seems to fail Scala 2.13 build.

[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala:63:53: type mismatch;
[error] found : scala.collection.MapView[Int,Long]
[error] required: Map[Int,Long]
[error] o.partitionToValueAndRunTimeMs.mapValues(_.value)
[error] ^

viirya

looks reasonable.

SparkQA · 2021-04-24T23:20:35Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42421/

SparkQA · 2021-04-24T23:20:36Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42421/

SparkQA · 2021-04-25T02:25:53Z

Test build #137896 has finished for PR 32316 at commit 144d198.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2021-04-25T07:23:50Z

cc @HeartSaVioR FYI

HeartSaVioR

LGTM, thanks for the fix!

HeartSaVioR · 2021-04-26T01:20:36Z

Would we like to wait for @jose-torres to do the final review (and probably sign-off), or OK to go merging?

jose-torres

LGTM

HeartSaVioR · 2021-04-26T23:06:44Z

OK thanks everyone for reviewing. I'm going to merge this.

…n ContinuousSuite ### What changes were proposed in this pull request? This is another attempt to fix the flaky test "query without test harness" on ContinuousSuite. `query without test harness` is flaky because it starts a continuous query with two partitions but assumes they will run at the same speed. In this test, 0 and 2 will be written to partition 0, 1 and 3 will be written to partition 1. It assumes when we see 3, 2 should be written to the memory sink. But this is not guaranteed. We can add `if (currentValue == 2) Thread.sleep(5000)` at this line https://github.com/apache/spark/blob/b2a2b5d8206b7c09b180b8b6363f73c6c3fdb1d8/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala#L135 to reproduce the failure: `Result set Set([0], [1], [3]) are not a superset of Set(0, 1, 2, 3)!` The fix is changing `waitForRateSourceCommittedValue` to wait until all partitions reach the desired values before stopping the query. ### Why are the changes needed? Fix a flaky test. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. Manually verify the reproduction I mentioned above doesn't fail after this change. Closes #32316 from zsxwing/SPARK-28247-fix. Authored-by: Shixiong Zhu <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]> (cherry picked from commit 0df3b50) Signed-off-by: Jungtaek Lim <[email protected]>

HeartSaVioR · 2021-04-26T23:40:29Z

Thanks @zsxwing for the fix! I merged this in master/3.1/3.0. I skipped 2.4 as it's unlikely that we'll want to maintain 2.4 version line further.

…n ContinuousSuite ### What changes were proposed in this pull request? This is another attempt to fix the flaky test "query without test harness" on ContinuousSuite. `query without test harness` is flaky because it starts a continuous query with two partitions but assumes they will run at the same speed. In this test, 0 and 2 will be written to partition 0, 1 and 3 will be written to partition 1. It assumes when we see 3, 2 should be written to the memory sink. But this is not guaranteed. We can add `if (currentValue == 2) Thread.sleep(5000)` at this line https://github.com/apache/spark/blob/b2a2b5d8206b7c09b180b8b6363f73c6c3fdb1d8/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala#L135 to reproduce the failure: `Result set Set([0], [1], [3]) are not a superset of Set(0, 1, 2, 3)!` The fix is changing `waitForRateSourceCommittedValue` to wait until all partitions reach the desired values before stopping the query. ### Why are the changes needed? Fix a flaky test. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. Manually verify the reproduction I mentioned above doesn't fail after this change. Closes apache#32316 from zsxwing/SPARK-28247-fix. Authored-by: Shixiong Zhu <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>

…n ContinuousSuite ### What changes were proposed in this pull request? This is another attempt to fix the flaky test "query without test harness" on ContinuousSuite. `query without test harness` is flaky because it starts a continuous query with two partitions but assumes they will run at the same speed. In this test, 0 and 2 will be written to partition 0, 1 and 3 will be written to partition 1. It assumes when we see 3, 2 should be written to the memory sink. But this is not guaranteed. We can add `if (currentValue == 2) Thread.sleep(5000)` at this line https://github.com/apache/spark/blob/b2a2b5d8206b7c09b180b8b6363f73c6c3fdb1d8/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala#L135 to reproduce the failure: `Result set Set([0], [1], [3]) are not a superset of Set(0, 1, 2, 3)!` The fix is changing `waitForRateSourceCommittedValue` to wait until all partitions reach the desired values before stopping the query. ### Why are the changes needed? Fix a flaky test. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. Manually verify the reproduction I mentioned above doesn't fail after this change. Closes apache#32316 from zsxwing/SPARK-28247-fix. Authored-by: Shixiong Zhu <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]> (cherry picked from commit 0df3b50) Signed-off-by: Jungtaek Lim <[email protected]>

fix

4c053ae

github-actions bot added SQL STRUCTURED STREAMING labels Apr 23, 2021

fix for Scala 2.13

144d198

viirya reviewed Apr 24, 2021

View reviewed changes

viirya approved these changes Apr 25, 2021

View reviewed changes

HyukjinKwon approved these changes Apr 25, 2021

View reviewed changes

HeartSaVioR approved these changes Apr 26, 2021

View reviewed changes

dongjoon-hyun approved these changes Apr 26, 2021

View reviewed changes

jose-torres approved these changes Apr 26, 2021

View reviewed changes

HeartSaVioR closed this in 0df3b50 Apr 26, 2021

[SPARK-28247][SS][TEST]Fix flaky test "query without test harness" on ContinuousSuite #32316

[SPARK-28247][SS][TEST]Fix flaky test "query without test harness" on ContinuousSuite #32316

Uh oh!

Conversation

zsxwing commented Apr 23, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

zsxwing commented Apr 23, 2021

Uh oh!

SparkQA commented Apr 24, 2021

Uh oh!

SparkQA commented Apr 24, 2021

Uh oh!

viirya commented Apr 24, 2021

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 24, 2021

Uh oh!

SparkQA commented Apr 24, 2021

Uh oh!

SparkQA commented Apr 25, 2021

Uh oh!

HyukjinKwon commented Apr 25, 2021

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Apr 26, 2021

Uh oh!

jose-torres left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Apr 26, 2021

Uh oh!

HeartSaVioR commented Apr 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants