[SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event #44090

jiangxb1987 · 2023-11-30T03:18:52Z

What changes were proposed in this pull request?

We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time.

To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above.

Why are the changes needed?

To fix a race condition that could lead to shuffle data lost, thus longer query execution time.

How was this patch tested?

This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests.

Was this patch authored or co-authored using generative AI tooling?

No

mridulm · 2023-11-30T17:58:54Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala

This is getting queried in a different thread - so needs to be thread safe.

good catch! updated...

dongjoon-hyun

+1, LGTM (from my side).

I revised the PR title because the variable lastTaskRunningTime is replaced with lastTaskFinishTime.

I believe we need @mridulm 's approval, too.

mridulm

Thanks for fixing this @jiangxb1987 !

…inished event ### What changes were proposed in this pull request? We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time. To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above. ### Why are the changes needed? To fix a race condition that could lead to shuffle data lost, thus longer query execution time. ### How was this patch tested? This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #44090 from jiangxb1987/SPARK-46182. Authored-by: Xingbo Jiang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 6f112f7) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2023-12-04T06:09:45Z

Thank you, @jiangxb1987 and @mridulm . Merged to master/3.5/3.4.

jiangxb1987 · 2023-12-04T19:00:21Z

Thank you so much! @mridulm @dongjoon-hyun

…inished event ### What changes were proposed in this pull request? We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time. To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above. ### Why are the changes needed? To fix a race condition that could lead to shuffle data lost, thus longer query execution time. ### How was this patch tested? This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#44090 from jiangxb1987/SPARK-46182. Authored-by: Xingbo Jiang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…inished event ### What changes were proposed in this pull request? We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time. To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above. ### Why are the changes needed? To fix a race condition that could lead to shuffle data lost, thus longer query execution time. ### How was this patch tested? This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#44090 from jiangxb1987/SPARK-46182. Authored-by: Xingbo Jiang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 6f112f7) Signed-off-by: Dongjoon Hyun <[email protected]>

…inished event (apache#394) ### What changes were proposed in this pull request? We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time. To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above. ### Why are the changes needed? To fix a race condition that could lead to shuffle data lost, thus longer query execution time. ### How was this patch tested? This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#44090 from jiangxb1987/SPARK-46182. Authored-by: Xingbo Jiang <[email protected]> (cherry picked from commit 6f112f7) Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: Xingbo Jiang <[email protected]>

github-actions bot added the CORE label Nov 30, 2023

mridulm reviewed Nov 30, 2023

View reviewed changes

HyukjinKwon changed the title ~~[SPARK-46182] Track the lastTaskRunningTime using the exact task finished event~~ [SPARK-46182][CORE] Track the lastTaskRunningTime using the exact task finished event Dec 1, 2023

jiangxb1987 added 2 commits December 1, 2023 16:05

fix

f31420d

update

f8aaaca

jiangxb1987 force-pushed the SPARK-46182 branch from ae75b11 to f8aaaca Compare December 2, 2023 00:12

dongjoon-hyun changed the title ~~[SPARK-46182][CORE] Track the lastTaskRunningTime using the exact task finished event~~ [SPARK-46182][CORE] Track lastTaskFinishTime using the exact task finished event Dec 4, 2023

dongjoon-hyun approved these changes Dec 4, 2023

View reviewed changes

mridulm approved these changes Dec 4, 2023

View reviewed changes

dongjoon-hyun closed this in 6f112f7 Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event #44090

[SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event #44090

Uh oh!

jiangxb1987 commented Nov 30, 2023

Uh oh!

mridulm Nov 30, 2023

Uh oh!

jiangxb1987 Dec 2, 2023

Uh oh!

dongjoon-hyun left a comment

Uh oh!

mridulm left a comment

Uh oh!

dongjoon-hyun commented Dec 4, 2023 •

edited

Loading

Uh oh!

jiangxb1987 commented Dec 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-46182][CORE] Track lastTaskFinishTime using the exact task finished event #44090

[SPARK-46182][CORE] Track lastTaskFinishTime using the exact task finished event #44090

Uh oh!

Conversation

jiangxb1987 commented Nov 30, 2023

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

mridulm Nov 30, 2023

Choose a reason for hiding this comment

Uh oh!

jiangxb1987 Dec 2, 2023

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

mridulm left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Dec 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiangxb1987 commented Dec 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event #44090

[SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event #44090

dongjoon-hyun commented Dec 4, 2023 •

edited

Loading