Skip to content

Conversation

@jiangxb1987
Copy link
Contributor

What changes were proposed in this pull request?

We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time.

To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above.

Why are the changes needed?

To fix a race condition that could lead to shuffle data lost, thus longer query execution time.

How was this patch tested?

This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Nov 30, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting queried in a different thread - so needs to be thread safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! updated...

@HyukjinKwon HyukjinKwon changed the title [SPARK-46182] Track the lastTaskRunningTime using the exact task finished event [SPARK-46182][CORE] Track the lastTaskRunningTime using the exact task finished event Dec 1, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-46182][CORE] Track the lastTaskRunningTime using the exact task finished event [SPARK-46182][CORE] Track lastTaskFinishTime using the exact task finished event Dec 4, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (from my side).

I revised the PR title because the variable lastTaskRunningTime is replaced with lastTaskFinishTime.

I believe we need @mridulm 's approval, too.

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @jiangxb1987 !

dongjoon-hyun pushed a commit that referenced this pull request Dec 4, 2023
…inished event

### What changes were proposed in this pull request?

We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time.

To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above.

### Why are the changes needed?

To fix a race condition that could lead to shuffle data lost, thus longer query execution time.

### How was this patch tested?

This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44090 from jiangxb1987/SPARK-46182.

Authored-by: Xingbo Jiang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 6f112f7)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Dec 4, 2023
…inished event

### What changes were proposed in this pull request?

We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time.

To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above.

### Why are the changes needed?

To fix a race condition that could lead to shuffle data lost, thus longer query execution time.

### How was this patch tested?

This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44090 from jiangxb1987/SPARK-46182.

Authored-by: Xingbo Jiang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 6f112f7)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 4, 2023

Thank you, @jiangxb1987 and @mridulm . Merged to master/3.5/3.4.

@jiangxb1987
Copy link
Contributor Author

Thank you so much! @mridulm @dongjoon-hyun

asl3 pushed a commit to asl3/spark that referenced this pull request Dec 5, 2023
…inished event

### What changes were proposed in this pull request?

We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time.

To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above.

### Why are the changes needed?

To fix a race condition that could lead to shuffle data lost, thus longer query execution time.

### How was this patch tested?

This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#44090 from jiangxb1987/SPARK-46182.

Authored-by: Xingbo Jiang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dbatomic pushed a commit to dbatomic/spark that referenced this pull request Dec 11, 2023
…inished event

### What changes were proposed in this pull request?

We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time.

To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above.

### Why are the changes needed?

To fix a race condition that could lead to shuffle data lost, thus longer query execution time.

### How was this patch tested?

This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#44090 from jiangxb1987/SPARK-46182.

Authored-by: Xingbo Jiang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
…inished event

### What changes were proposed in this pull request?

We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time.

To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above.

### Why are the changes needed?

To fix a race condition that could lead to shuffle data lost, thus longer query execution time.

### How was this patch tested?

This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#44090 from jiangxb1987/SPARK-46182.

Authored-by: Xingbo Jiang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 6f112f7)
Signed-off-by: Dongjoon Hyun <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…inished event (apache#394)

### What changes were proposed in this pull request?

We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time.

To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above.

### Why are the changes needed?

To fix a race condition that could lead to shuffle data lost, thus longer query execution time.

### How was this patch tested?

This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#44090 from jiangxb1987/SPARK-46182.

Authored-by: Xingbo Jiang <[email protected]>

(cherry picked from commit 6f112f7)

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Xingbo Jiang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants