Skip to content

retentionPolicy.completed actually means succeeded #12135

@devstewart

Description

@devstewart

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

Submitted workflow which is expected to fail and it does, but the workflow gets deleted immediately.

image

Within seconds I get workflow gone.

I expected the failed workflows to remain as they did in earlier versions. I do not have the workflow-controller-configmap configured to do this. Relevant configuration:

completed: 400
spec:
  activeDeadlineSeconds: 86400
  podGC:
    strategy: OnPodSuccess

Version

v3.5.0

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  namespace: argo
  generateName: hello-world-
  labels:
    workflows.argoproj.io/archive-strategy: "false"
  annotations:
    workflows.argoproj.io/description: |
      This is a simple hello world example.
spec:
  entrypoint: helloworld
  templates:
  - name: helloworld
    container:
      image: ghcr.io/openzipkin/alpine:3.18.2
      command: ["/bin/sh"]
      args: ["-c", "exit 1"]

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

time="2023-11-03T20:38:15.027Z" level=info msg="Processing workflow" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:15.032Z" level=info msg="Updated phase  -> Running" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:15.033Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:15.033Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:15.033Z" level=info msg="Pod node hello-world-59xpp initialized Pending" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:15.057Z" level=info msg="Created pod: hello-world-59xpp (hello-world-59xpp)" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:15.057Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:15.057Z" level=info msg=reconcileAgentPod namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:15.070Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=1703572159 workflow=hello-world-59xpp
time="2023-11-03T20:38:25.061Z" level=info msg="Processing workflow" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:25.062Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=hello-world-59xpp
time="2023-11-03T20:38:25.062Z" level=info msg="Pod failed: Error (exit code 1)" displayName=hello-world-59xpp namespace=argo pod=hello-world-59xpp templateName=helloworld workflow=hello-world-59xpp
time="2023-11-03T20:38:25.063Z" level=info msg="node changed" namespace=argo new.message="Error (exit code 1)" new.phase=Failed new.progress=0/1 nodeID=hello-world-59xpp old.message= old.phase=Pending old.progress=0/1 workflow=hello-world-59xpp
time="2023-11-03T20:38:25.063Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:25.063Z" level=info msg=reconcileAgentPod namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:25.063Z" level=info msg="Updated phase Running -> Failed" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:25.063Z" level=info msg="Updated message  -> Error (exit code 1)" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:25.063Z" level=info msg="Marking workflow completed" namespace=argo workflow=hello-world-59xpp
time="2023-11-03T20:38:25.068Z" level=info msg="cleaning up pod" action=deletePod key=argo/hello-world-59xpp-1340600742-agent/deletePod
time="2023-11-03T20:38:25.071Z" level=info msg="Workflow update successful" namespace=argo phase=Failed resourceVersion=1703572339 workflow=hello-world-59xpp
time="2023-11-03T20:38:25.073Z" level=info msg="Queueing Failed workflow argo/hello-world-59xpp for delete due to max rention(0 workflows)"
time="2023-11-03T20:38:25.073Z" level=info msg="Deleting garbage collected workflow 'argo/hello-world-59xpp'"
time="2023-11-03T20:38:25.081Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/hello-world-59xpp/labelPodCompleted
time="2023-11-03T20:38:25.081Z" level=info msg="Successfully request 'argo/hello-world-59xpp' to be deleted"

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

time="2023-11-03T20:38:19.525Z" level=info msg="Creating minio client using static credentials" endpoint=deap-api.decloud.xxx.com
time="2023-11-03T20:38:19.525Z" level=info msg="Saving file to s3" bucket=deap-argo-prod endpoint=deap-api.decloud.xxx.com key="dev\\ /2023\\ /11\\ /03\\ /hello-world-59xpp\\ /hello-world-59xpp\"/main.log" path=/tmp/argo/outputs/logs/main.log
time="2023-11-03T20:38:19.594Z" level=info msg="Save artifact" artifactName=main-logs duration=94.016454ms error="<nil>" key="dev\\ /2023\\ /11\\ /03\\ /hello-world-59xpp\\ /hello-world-59xpp\"/main.log"
time="2023-11-03T20:38:19.594Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2023-11-03T20:38:19.594Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2023-11-03T20:38:19.604Z" level=warning msg="failed to patch task set, falling back to legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:argo:default\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"argo\""
time="2023-11-03T20:38:19.605Z" level=warning msg="Non-transient error: pods \"hello-world-59xpp\" is forbidden: User \"system:serviceaccount:argo:default\" cannot patch resource \"pods\" in API group \"\" in the namespace \"argo\""
time="2023-11-03T20:38:19.606Z" level=error msg="executor error: pods \"hello-world-59xpp\" is forbidden: User \"system:serviceaccount:argo:default\" cannot patch resource \"pods\" in API group \"\" in the namespace \"argo\""
time="2023-11-03T20:38:19.606Z" level=info msg="Alloc=12580 TotalAlloc=18686 Sys=30309 NumGC=4 Goroutines=10"
time="2023-11-03T20:38:19.606Z" level=fatal msg="pods \"hello-world-59xpp\" is forbidden: User \"system:serviceaccount:argo:default\" cannot patch resource \"pods\" in API group \"\" in the namespace \"argo\""

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low priorityarea/controllerController issues, panicsarea/gcGarbage collection, such as TTLs, retentionPolicy, delays, and moresolution/suggestedA solution to the bug has been suggested. Someone needs to implement it.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions