Skip to content

Conversation

@Joibel
Copy link
Member

@Joibel Joibel commented Dec 11, 2025

Motivation

inferFailedReason doesn't attempt to cope well with a Failed Pod where the containers are not marked Terminated. We've seen this situation in the field - likely down to pod eviction.

The current behaviour would eventually be to drop out the bottom of the function and return NodeSucceeded which feels quite optimistic for a failed pod.

Modifications

Instead monitor whether both wait and main exited 0, if they did, then lets call this a Success, but otherwise return Failure telling the user that the pod has failed.

Verification

Unit tests, but I don't know how to produce this usefully in e2e.

Documentation

None required.

@Joibel Joibel added area/controller Controller issues, panics cherry-pick/3.6 Cherry-pick this to release-3.6 cherry-pick/3.7 Cherry-pick this to release-3.7 labels Dec 11, 2025
@Joibel Joibel changed the title fix: if pod fails, don't mark node succeeded always fix: if pod fails without container termination, don't mark node succeeded always Dec 11, 2025
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Joibel Joibel merged commit 46b0ddd into argoproj:main Dec 11, 2025
47 checks passed
@argo-cd-cherry-pick-bot
Copy link

❌ Cherry-pick failed for 3.6. Please check the workflow logs for details.

@argo-cd-cherry-pick-bot
Copy link

❌ Cherry-pick failed for 3.7. Please check the workflow logs for details.

Joibel added a commit that referenced this pull request Dec 11, 2025
…eeded always (#15150)

(cherry picked from commit 46b0ddd)

Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Joibel added a commit that referenced this pull request Dec 11, 2025
…eeded always (#15150)

(cherry picked from commit 46b0ddd)

Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Joibel added a commit that referenced this pull request Dec 12, 2025
…eeded always (cherry-pick #15150 for 3.7) (#15155)

Signed-off-by: Alan Clucas <[email protected]>
Joibel added a commit that referenced this pull request Dec 12, 2025
…eeded always (cherry-pick #15150 for 3.7) (#15155)

Signed-off-by: Alan Clucas <[email protected]>
(cherry picked from commit d24f6b5)
Joibel added a commit that referenced this pull request Dec 12, 2025
…eeded always (cherry-pick #15150 for 3.6) (#15157)

Signed-off-by: Alan Clucas <[email protected]>
guanguxiansheng pushed a commit to guanguxiansheng/argo-workflows that referenced this pull request Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/controller Controller issues, panics cherry-pick/3.6 Cherry-pick this to release-3.6 cherry-pick/3.7 Cherry-pick this to release-3.7

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants