argoproj · Joibel · Dec 17, 2025 · Nov 25, 2025 · Nov 27, 2025 · Nov 28, 2025
diff --git a/.features/pending/pod-restart.md b/.features/pending/pod-restart.md
@@ -0,0 +1,8 @@
+Description: Restart pods that fail before starting
+Authors: [Alan Clucas](https://github.com/Joibel)
+Component: General
+Issues: 12572
+
+Automatically restart pods that fail before starting for reasons like node eviction.
+This is safe to do even for non-idempotent workloads.
+You need to configure this in your workflow controller configmap for it to take effect.
diff --git a/api/jsonschema/schema.json b/api/jsonschema/schema.json
diff --git a/api/openapi-spec/swagger.json b/api/openapi-spec/swagger.json
diff --git a/config/config.go b/config/config.go
@@ -123,6 +123,39 @@ type Config struct {
 
 	// ArtifactDrivers lists artifact driver plugins we can use
 	ArtifactDrivers []ArtifactDriver `json:"artifactDrivers,omitempty"`
+
+	// FailedPodRestart configures automatic restart of pods that fail before entering Running state
+	// (e.g., due to Eviction, DiskPressure, Preemption). This allows recovery from transient
+	// infrastructure issues without requiring a retryStrategy on templates.
+	FailedPodRestart *FailedPodRestartConfig `json:"failedPodRestart,omitempty"`
+}
+
+// FailedPodRestartConfig configures automatic restart of pods that fail before entering Running state.
+// This is useful for recovering from transient infrastructure issues like node eviction due to
+// DiskPressure or MemoryPressure without requiring a retryStrategy on every template.
+type FailedPodRestartConfig struct {
+	// Enabled enables automatic restart of pods that fail before entering Running state.
+	// When enabled, pods that fail due to infrastructure issues (like eviction) without ever
+	// running their main container will be automatically recreated.
+	// Default is false.
+	Enabled bool `json:"enabled,omitempty"`
+
+	// MaxRestarts is the maximum number of automatic restarts per node before giving up.
+	// This prevents infinite restart loops. Default is 3.
+	MaxRestarts *int32 `json:"maxRestarts,omitempty"`
+}
+
+// GetMaxRestarts returns the configured max restarts or the default value of 3.
+func (c *FailedPodRestartConfig) GetMaxRestarts() int32 {
+	if c == nil || c.MaxRestarts == nil {
+		return 3
+	}
+	return *c.MaxRestarts
+}
+
+// IsEnabled returns true if the feature is enabled.
+func (c *FailedPodRestartConfig) IsEnabled() bool {
+	return c != nil && c.Enabled
 }
 
 // ArtifactDriver is a plugin for an artifact driver

diff --git a/docs/fields.md b/docs/fields.md
@@ -1991,6 +1991,7 @@ NodeStatus contains status information about an individual node in the workflow
 |`daemoned`|`boolean`|Daemoned tracks whether or not this node was daemoned and need to be terminated|
 |`displayName`|`string`|DisplayName is a human readable representation of the node. Unique within a template boundary|
 |`estimatedDuration`|`integer`|EstimatedDuration in seconds.|
+|`failedPodRestarts`|`integer`|FailedPodRestarts tracks the number of times the pod for this node was restarted due to infrastructure failures before the main container started.|
 |`finishedAt`|[`Time`](#time)|Time at which this node completed|
 |`hostNodeName`|`string`|HostNodeName name of the Kubernetes node on which the Pod is running, if applicable|
 |`id`|`string`|ID is a unique identifier of a node within the worklow It is implemented as a hash of the node name, which makes the ID deterministic|
@@ -2005,6 +2006,7 @@ NodeStatus contains status information about an individual node in the workflow
 |`podIP`|`string`|PodIP captures the IP of the pod for daemoned steps|
 |`progress`|`string`|Progress to completion|
 |`resourcesDuration`|`Map< integer , int64 >`|ResourcesDuration is indicative, but not accurate, resource duration. This is populated when the nodes completes.|
+|`restartingPodUID`|`string`|RestartingPodUID tracks the UID of the pod that is currently being restarted. This prevents duplicate restart attempts when the controller processes the same failed pod multiple times. Cleared when the replacement pod starts running.|
 |`startedAt`|[`Time`](#time)|Time at which this node started|
 |`synchronizationStatus`|[`NodeSynchronizationStatus`](#nodesynchronizationstatus)|SynchronizationStatus is the synchronization status of the node|
 |`taskResultSynced`|`boolean`|TaskResultSynced is used to determine if the node's output has been received|

diff --git a/docs/metrics.md b/docs/metrics.md
@@ -381,6 +381,28 @@ Total number of pods that started pending by reason.
 | `reason`    | Summary of the kubernetes Reason for pending |
 | `namespace` | The namespace that the pod is in             |
 
+#### `pod_restarts_total`
+
+Total number of pods automatically restarted due to infrastructure failures before the main container started.
+This counter tracks pods that were automatically restarted by the [failed pod restart](pod-restarts.md) feature.
+These are infrastructure-level failures (like node eviction) that occur before the main container enters the Running state.
+
+|  attribute  |                                                 explanation                                                 |
+|-------------|-------------------------------------------------------------------------------------------------------------|
+| `reason`    | The infrastructure failure reason: `Evicted`, `NodeShutdown`, `NodeAffinity`, or `UnexpectedAdmissionError` |
+| `condition` | The node condition that caused the pod restart, e.g., `DiskPressure`, `MemoryPressure`                      |
+| `namespace` | The namespace that the pod is in                                                                            |
+
+`reason` will be one of:
+
+- `Evicted`: Node pressure eviction (`DiskPressure`, `MemoryPressure`, etc.)
+- `NodeShutdown`: Graceful node shutdown
+- `NodeAffinity`: Node affinity/selector no longer matches
+- `UnexpectedAdmissionError`: Unexpected error during pod admission
+
+`condition` is extracted from the pod status message when available (e.g., `DiskPressure`, `MemoryPressure`).
+It will be empty if the condition cannot be determined.
+
 #### `pods_gauge`
 
 A gauge of the number of workflow created pods currently in the cluster in each phase.

diff --git a/docs/pod-restarts.md b/docs/pod-restarts.md
@@ -0,0 +1,98 @@
+# Automatic Pod Restarts
+
+Argo Workflows can automatically restart pods that fail due to infrastructure issues before the main container starts.
+This feature handles transient failures like node evictions, disk pressure, or unexpected admission errors without requiring a `retryStrategy` on your templates.
+
+## How It Works
+
+When a pod fails before its main container enters the Running state, the workflow controller checks if the failure reason indicates an infrastructure issue.
+If so, the pod is automatically deleted and recreated, allowing the workflow to continue.
+For safety this mechanism only works on pods we know never started, for pods that might have started `retryStrategy` is the solution.
+
+This is different from [retryStrategy](retries.md), which handles application-level failures after the container has run.
+These are complementary mechanisms, in that both can occur.
+Automatic pod restarts handle infrastructure-level failures that occur before your code even starts.
+
+### Restartable Failure Reasons
+
+The following pod failure reasons trigger automatic restarts:
+
+| Reason | Description |
+|--------|-------------|
+| `Evicted` | Node pressure eviction (`DiskPressure`, `MemoryPressure`, etc.) |
+| `NodeShutdown` | Graceful node shutdown |
+| `NodeAffinity` | Node affinity/selector no longer matches |
+| `UnexpectedAdmissionError` | Unexpected error during pod admission |
+
+### Conditions for Restart
+
+A pod qualifies for automatic restart when ALL of the following are true:
+
+1. The pod phase is `Failed`
+2. The main container never entered the `Running` state
+3. The failure reason is one of the restartable reasons listed above
+4. The restart count for this pod hasn't exceeded the configured maximum
+
+## Configuration
+
+Enable automatic pod restarts in the workflow controller ConfigMap:
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: workflow-controller-configmap
+data:
+  failedPodRestart: |
+    enabled: true
+    maxRestarts: 3
+```
+
+### Configuration Options
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `enabled` | `bool` | `false` | Enable automatic pod restarts |
+| `maxRestarts` | `int` | `3` | Maximum restart attempts per node before giving up |
+
+## Monitoring
+
+When a pod is automatically restarted, the node status is updated with:
+
+- `FailedPodRestarts`: Counter tracking how many times the pod was restarted
+- `Message`: Updated to indicate the restart, e.g., `Pod auto-restarting due to Evicted: The node had condition: [DiskPressure]`
+
+You can view restart counts in the workflow status:
+
+```bash
+kubectl get wf my-workflow -o jsonpath='{.status.nodes[*].failedPodRestarts}'
+```
+
+The [`pod_restarts_total`](metrics.md#pod_restarts_total) metric tracks restarts by reason, condition, and namespace.
+
+## Comparison with `retryStrategy`
+
+| Feature | Automatic Pod Restarts | retryStrategy |
+|---------|----------------------|---------------|
+| **Trigger** | Infrastructure failures before container starts | Application failures after container runs |
+| **Configuration** | Global (controller ConfigMap) | Per-template |
+| **Use case** | Node evictions, disk pressure, admission errors | Application errors, transient failures |
+| **Counter** | `failedPodRestarts` in node status | `retries` in node status |
+
+Both features can work together.
+If a pod is evicted before starting, automatic restart handles it.
+If the container runs and fails, `retryStrategy` handles it.
+Some pods may not be idempotent, and so a `retryStrategy` would not be suitable, but restarting the pod is safe.
+
+## Example
+
+A workflow running on a node that experiences disk pressure:
+
+1. Pod is scheduled and init containers start
+2. Node experiences `DiskPressure`, evicting the pod before main container starts
+3. Controller detects the eviction and `FailedPodRestarts` condition
+4. Pod is deleted, and in workflow the node is marked as Pending to recreate the pod
+5. New pod is created on a healthy node
+6. Workflow continues normally
+
+The workflow succeeds without any template-level retry configuration needed.
diff --git a/docs/retries.md b/docs/retries.md
@@ -2,6 +2,10 @@
 
 Argo Workflows offers a range of options for retrying failed steps.
 
+!!! Note "restarts"
+    For infrastructure-level failures that occur before your container starts (like node evictions or disk pressure), see [Automatic Pod Restarts](pod-restarts.md).
+    This page covers application-level retries using `retryStrategy`.
+
 ## Configuring `retryStrategy` in `WorkflowSpec`
 
 ```yaml

diff --git a/docs/workflow-controller-configmap.md b/docs/workflow-controller-configmap.md
@@ -97,6 +97,7 @@ Config contains the root of the configuration settings for the workflow controll
 | `SSO`                      | [`SSOConfig`](#ssoconfig)                                                                                   | SSO in settings for single-sign on                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
 | `Synchronization`          | [`SyncConfig`](#syncconfig)                                                                                 | Synchronization via databases config                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
 | `ArtifactDrivers`          | `Array<`[`ArtifactDriver`](#artifactdriver)`>`                                                              | ArtifactDrivers lists artifact driver plugins we can use                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `FailedPodRestart`         | [`FailedPodRestartConfig`](#failedpodrestartconfig)                                                         | FailedPodRestart configures automatic restart of pods that fail before entering Running state (e.g., due to Eviction, DiskPressure, Preemption). This allows recovery from transient infrastructure issues without requiring a retryStrategy on templates.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
 
 ## NodeEvents
 
@@ -344,3 +345,14 @@ ArtifactDriver is a plugin for an artifact driver
 | `Name`                     | `wfv1.ArtifactPluginName` (string (name of an artifact plugin)) | Name is the name of the artifact driver plugin                                                   |
 | `Image`                    | `string`                                                        | Image is the docker image of the artifact driver                                                 |
 | `ConnectionTimeoutSeconds` | `int32`                                                         | ConnectionTimeoutSeconds is the timeout for the artifact driver connection, 5 seconds if not set |
+
+## FailedPodRestartConfig
+
+FailedPodRestartConfig configures automatic restart of pods that fail before entering Running state. This is useful for recovering from transient infrastructure issues like node eviction due to DiskPressure or MemoryPressure without requiring a retryStrategy on every template.
+
+### Fields
+
+|  Field Name   | Field Type |                                                                                                                        Description                                                                                                                        |
+|---------------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `Enabled`     | `bool`     | Enabled enables automatic restart of pods that fail before entering Running state. When enabled, pods that fail due to infrastructure issues (like eviction) without ever running their main container will be automatically recreated. Default is false. |
+| `MaxRestarts` | `int32`    | MaxRestarts is the maximum number of automatic restarts per node before giving up. This prevents infinite restart loops. Default is 3.                                                                                                                    |
diff --git a/hack/manifests/crdgen.sh b/hack/manifests/crdgen.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/usr/bin/env bash
 set -eu -o pipefail
 
 cd "$(dirname "$0")/../.." # up to repo root

diff --git a/manifests/base/crds/full/argoproj.io_workflows.yaml b/manifests/base/crds/full/argoproj.io_workflows.yaml
diff --git a/manifests/quick-start-minimal.yaml b/manifests/quick-start-minimal.yaml