Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Apr 23, 2021

What changes were proposed in this pull request?

This is another attempt to fix the flaky test "query without test harness" on ContinuousSuite.

query without test harness is flaky because it starts a continuous query with two partitions but assumes they will run at the same speed.

In this test, 0 and 2 will be written to partition 0, 1 and 3 will be written to partition 1. It assumes when we see 3, 2 should be written to the memory sink. But this is not guaranteed. We can add if (currentValue == 2) Thread.sleep(5000) at this line

to reproduce the failure: Result set Set([0], [1], [3]) are not a superset of Set(0, 1, 2, 3)!

The fix is changing waitForRateSourceCommittedValue to wait until all partitions reach the desired values before stopping the query.

Why are the changes needed?

Fix a flaky test.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests. Manually verify the reproduction I mentioned above doesn't fail after this change.

@zsxwing
Copy link
Member Author

zsxwing commented Apr 23, 2021

cc @jose-torres

@SparkQA
Copy link

SparkQA commented Apr 24, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42402/

@SparkQA
Copy link

SparkQA commented Apr 24, 2021

Test build #137872 has finished for PR 32316 at commit 4c053ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Apr 24, 2021

It seems to fail Scala 2.13 build.

[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala:63:53: type mismatch;
[error] found : scala.collection.MapView[Int,Long]
[error] required: Map[Int,Long]
[error] o.partitionToValueAndRunTimeMs.mapValues(_.value)
[error] ^

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable.

@SparkQA
Copy link

SparkQA commented Apr 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42421/

@SparkQA
Copy link

SparkQA commented Apr 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42421/

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Test build #137896 has finished for PR 32316 at commit 144d198.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

cc @HeartSaVioR FYI

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix!

@HeartSaVioR
Copy link
Contributor

Would we like to wait for @jose-torres to do the final review (and probably sign-off), or OK to go merging?

Copy link
Contributor

@jose-torres jose-torres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HeartSaVioR
Copy link
Contributor

OK thanks everyone for reviewing. I'm going to merge this.

HeartSaVioR pushed a commit that referenced this pull request Apr 26, 2021
…n ContinuousSuite

### What changes were proposed in this pull request?

This is another attempt to fix the flaky test "query without test harness" on ContinuousSuite.

`query without test harness` is flaky because it starts a continuous query with two partitions but assumes they will run at the same speed.

In this test, 0 and 2 will be written to partition 0, 1 and 3 will be written to partition 1. It assumes when we see 3, 2 should be written to the memory sink. But this is not guaranteed. We can add `if (currentValue == 2) Thread.sleep(5000)` at this line https://github.com/apache/spark/blob/b2a2b5d8206b7c09b180b8b6363f73c6c3fdb1d8/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala#L135 to reproduce the failure: `Result set Set([0], [1], [3]) are not a superset of Set(0, 1, 2, 3)!`

The fix is changing `waitForRateSourceCommittedValue` to wait until all partitions reach the desired values before stopping the query.

### Why are the changes needed?

Fix a flaky test.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests. Manually verify the reproduction I mentioned above doesn't fail after this change.

Closes #32316 from zsxwing/SPARK-28247-fix.

Authored-by: Shixiong Zhu <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
(cherry picked from commit 0df3b50)
Signed-off-by: Jungtaek Lim <[email protected]>
HeartSaVioR pushed a commit that referenced this pull request Apr 26, 2021
…n ContinuousSuite

### What changes were proposed in this pull request?

This is another attempt to fix the flaky test "query without test harness" on ContinuousSuite.

`query without test harness` is flaky because it starts a continuous query with two partitions but assumes they will run at the same speed.

In this test, 0 and 2 will be written to partition 0, 1 and 3 will be written to partition 1. It assumes when we see 3, 2 should be written to the memory sink. But this is not guaranteed. We can add `if (currentValue == 2) Thread.sleep(5000)` at this line https://github.com/apache/spark/blob/b2a2b5d8206b7c09b180b8b6363f73c6c3fdb1d8/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala#L135 to reproduce the failure: `Result set Set([0], [1], [3]) are not a superset of Set(0, 1, 2, 3)!`

The fix is changing `waitForRateSourceCommittedValue` to wait until all partitions reach the desired values before stopping the query.

### Why are the changes needed?

Fix a flaky test.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests. Manually verify the reproduction I mentioned above doesn't fail after this change.

Closes #32316 from zsxwing/SPARK-28247-fix.

Authored-by: Shixiong Zhu <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
(cherry picked from commit 0df3b50)
Signed-off-by: Jungtaek Lim <[email protected]>
@HeartSaVioR
Copy link
Contributor

Thanks @zsxwing for the fix! I merged this in master/3.1/3.0. I skipped 2.4 as it's unlikely that we'll want to maintain 2.4 version line further.

jdcasale pushed a commit to palantir/spark that referenced this pull request Jun 22, 2021
…n ContinuousSuite

### What changes were proposed in this pull request?

This is another attempt to fix the flaky test "query without test harness" on ContinuousSuite.

`query without test harness` is flaky because it starts a continuous query with two partitions but assumes they will run at the same speed.

In this test, 0 and 2 will be written to partition 0, 1 and 3 will be written to partition 1. It assumes when we see 3, 2 should be written to the memory sink. But this is not guaranteed. We can add `if (currentValue == 2) Thread.sleep(5000)` at this line https://github.com/apache/spark/blob/b2a2b5d8206b7c09b180b8b6363f73c6c3fdb1d8/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala#L135 to reproduce the failure: `Result set Set([0], [1], [3]) are not a superset of Set(0, 1, 2, 3)!`

The fix is changing `waitForRateSourceCommittedValue` to wait until all partitions reach the desired values before stopping the query.

### Why are the changes needed?

Fix a flaky test.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests. Manually verify the reproduction I mentioned above doesn't fail after this change.

Closes apache#32316 from zsxwing/SPARK-28247-fix.

Authored-by: Shixiong Zhu <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…n ContinuousSuite

### What changes were proposed in this pull request?

This is another attempt to fix the flaky test "query without test harness" on ContinuousSuite.

`query without test harness` is flaky because it starts a continuous query with two partitions but assumes they will run at the same speed.

In this test, 0 and 2 will be written to partition 0, 1 and 3 will be written to partition 1. It assumes when we see 3, 2 should be written to the memory sink. But this is not guaranteed. We can add `if (currentValue == 2) Thread.sleep(5000)` at this line https://github.com/apache/spark/blob/b2a2b5d8206b7c09b180b8b6363f73c6c3fdb1d8/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala#L135 to reproduce the failure: `Result set Set([0], [1], [3]) are not a superset of Set(0, 1, 2, 3)!`

The fix is changing `waitForRateSourceCommittedValue` to wait until all partitions reach the desired values before stopping the query.

### Why are the changes needed?

Fix a flaky test.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests. Manually verify the reproduction I mentioned above doesn't fail after this change.

Closes apache#32316 from zsxwing/SPARK-28247-fix.

Authored-by: Shixiong Zhu <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
(cherry picked from commit 0df3b50)
Signed-off-by: Jungtaek Lim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants