Skip to content

Conversation

@shuangkun
Copy link
Member

@shuangkun shuangkun commented Dec 11, 2025

Previously, sidecar containers with exit code 137 were always ignored, assuming they were killed by argoexec. However, if a sidecar is OOMKilled (exit code 137 with reason OOMKilled), it should be treated as a failure because the wait container may not be able to observe the sidecar OOM.

This change adds a check for OOMKilled reason before ignoring exit code 137, ensuring that sidecar OOM failures are properly reported.

Fixes sidecar OOM detection issue where workflows would succeed even when sidecars were killed due to memory exhaustion.

Motivation

Modifications

Verification

UT and e2e with the sample.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: sidecar-oom-example-
spec:
  entrypoint: main-step
  templates:
    - name: main-step
      container:
        image: alpine:3.7
        command: ["sh", "-c"]
        args:
          - |
            echo "Main container started. I will sleep for 30 seconds and then exit successfully."
            sleep 30
            echo "Main container finished with exit code 0."
        resources:
          requests:
            cpu: 2
            memory: "3Gi"

      # sidecar
      sidecars:
        - name: memory-eater 
          image: alpine:3.7
          command: ["sh", "-c"]
          args:
            - |
              echo "Sidecar started. I will now consume as much memory as possible..."
              dd if=/dev/urandom of=/dev/shm/largefile bs=1M count=100
              tail -f /dev/shm/largefile
          resources:
            requests:
              memory: "16Mi"
            limits:
              memory: "32Mi"
image

Documentation

Previously, sidecar containers with exit code 137 were always ignored,
assuming they were killed by argoexec. However, if a sidecar is OOMKilled
(exit code 137 with reason OOMKilled), it should be treated as a failure
because the wait container may not be able to observe the sidecar OOM.

This change adds a check for OOMKilled reason before ignoring exit code 137,
ensuring that sidecar OOM failures are properly reported.

Fixes sidecar OOM detection issue where workflows would succeed even when
sidecars were killed due to memory exhaustion.

Signed-off-by: shuangkun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant