OCPEDGE-1484: [TNF] kubelet disruption test #30290

dhensel-rh · 2025-09-22T19:11:37Z

These tests check what happens when kubelet becomes unavailable.

openshift-trt · 2025-09-22T22:11:10Z

Job Failure Risk Analysis for sha: 7d6c813

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-vsphere-ovn	IncompleteTests Tests for this run (19) are below the historical average (3346): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi	IncompleteTests Tests for this run (19) are below the historical average (3132): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt · 2025-09-25T20:14:49Z

Job Failure Risk Analysis for sha: 15714f2

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-agnostic-ovn-cmd	IncompleteTests Tests for this run (3) are below the historical average (1641): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-csi	IncompleteTests Tests for this run (3) are below the historical average (1665): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2	IncompleteTests Tests for this run (2) are below the historical average (2382): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones	IncompleteTests Tests for this run (2) are below the historical average (2499): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips	IncompleteTests Tests for this run (2) are below the historical average (2581): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift	IncompleteTests Tests for this run (3) are below the historical average (1285): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial	IncompleteTests Tests for this run (2) are below the historical average (655): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2	IncompleteTests Tests for this run (3) are below the historical average (1583): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	IncompleteTests Tests for this run (2) are below the historical average (1547): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node	IncompleteTests Tests for this run (3) are below the historical average (2300): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-serial	IncompleteTests Tests for this run (3) are below the historical average (1782): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade	IncompleteTests Tests for this run (2) are below the historical average (4031): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi	IncompleteTests Tests for this run (2) are below the historical average (1733): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn	IncompleteTests Tests for this run (2) are below the historical average (3188): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-rt-upgrade	IncompleteTests Tests for this run (2) are below the historical average (1796): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade	IncompleteTests Tests for this run (3) are below the historical average (1867): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn	IncompleteTests Tests for this run (3) are below the historical average (2755): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6	IncompleteTests Tests for this run (2) are below the historical average (2842): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout	IncompleteTests Tests for this run (3) are below the historical average (1562): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-openstack-ovn	IncompleteTests Tests for this run (2) are below the historical average (1872): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Showing 20 of 23 jobs analysis

openshift-trt · 2025-09-25T22:09:26Z

Job Failure Risk Analysis for sha: eab19dc

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi	IncompleteTests Tests for this run (3) are below the historical average (1642): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips	IncompleteTests Tests for this run (3) are below the historical average (2580): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node	IncompleteTests Tests for this run (2) are below the historical average (2270): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt · 2025-10-03T17:24:13Z

Job Failure Risk Analysis for sha: 99e3ea0

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi	IncompleteTests Tests for this run (3) are below the historical average (928): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi	IncompleteTests Tests for this run (3) are below the historical average (1222): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

clobrano

I left some comments. I noticed, also, that there are quite a few functions not used that must be cleaned up or integrated.

test/extended/two_node/common.go

test/extended/two_node/tnf_kubelet_disruption.go

test/extended/two_node/common.go

test/extended/util/nodes.go

test/extended/two_node/tnf_kubelet_disruption.go

jaypoulz · 2025-10-08T19:25:42Z

test/extended/two_node/tnf_kubelet_disruption.go

+		return fmt.Errorf("failed to list cluster operators: %v", err)
+	}
+
+	for _, operator := range clusterOperators.Items {


I think I have a library function that does this in the disruption test shared libraries.
https://github.com/openshift/origin/pull/30298/files#diff-4ed6c69fb287c32c3a2fd0f6a6f1e31eca45ed3267feb03d4e466a3ee4c7f276R1274-R1275

I think this approach is fine, but I would recommend collecting the unhealthy things into slices and then returning an error that lists all of the unavailable and degraded operators if the length of those slices are > 0

The latest change does collect unhealthy things now. An example

[FAILED] Timed out after 60.001s. etcd cluster operator should be healthy before starting test Unexpected error: <*errors.errorString | 0xc0017b9740>: etcd ClusterOperator is not Available: &{Available False 2025-10-13 10:42:14 -0400 EDT EtcdMembers_NoQuorum EtcdMembersAvailable: 1 of 2 members are available, NAME-PENDING-192.168.111.20 has not started} { s: "etcd ClusterOperator is not Available: &{Available False 2025-10-13 10:42:14 -0400 EDT EtcdMembers_NoQuorum EtcdMembersAvailable: 1 of 2 members are available, NAME-PENDING-192.168.111.20 has not started}", } occurred In [BeforeEach] at: github.com/openshift/origin/test/extended/two_node/tnf_kubelet_disruption.go:42 @ 10/13/25 10:49:59.218 ------------------------------

I think this meets your request above. Please verify

Looks good. Consider moving to common.go

test/extended/two_node/tnf_kubelet_disruption.go

openshift-trt · 2025-10-09T18:14:51Z

Job Failure Risk Analysis for sha: aa19408

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi	IncompleteTests Tests for this run (3) are below the historical average (814): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips	IncompleteTests Tests for this run (2) are below the historical average (2918): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift	IncompleteTests Tests for this run (3) are below the historical average (1383): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial	IncompleteTests Tests for this run (3) are below the historical average (609): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2	IncompleteTests Tests for this run (3) are below the historical average (1559): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	IncompleteTests Tests for this run (3) are below the historical average (1536): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi	IncompleteTests Tests for this run (3) are below the historical average (981): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn	IncompleteTests Tests for this run (2) are below the historical average (3312): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade	IncompleteTests Tests for this run (3) are below the historical average (1840): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6	IncompleteTests Tests for this run (2) are below the historical average (3083): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn	IncompleteTests Tests for this run (3) are below the historical average (3300): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi	IncompleteTests Tests for this run (3) are below the historical average (3112): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn	IncompleteTests Tests for this run (12) are below the historical average (1905): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt · 2025-10-10T17:14:03Z

Job Failure Risk Analysis for sha: 19be558

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi	IncompleteTests Tests for this run (3) are below the historical average (877): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips	IncompleteTests Tests for this run (2) are below the historical average (3005): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift	IncompleteTests Tests for this run (3) are below the historical average (1675): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial	IncompleteTests Tests for this run (3) are below the historical average (749): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2	IncompleteTests Tests for this run (3) are below the historical average (1602): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	IncompleteTests Tests for this run (3) are below the historical average (1576): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi	IncompleteTests Tests for this run (3) are below the historical average (1034): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn	IncompleteTests Tests for this run (2) are below the historical average (3354): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade	IncompleteTests Tests for this run (3) are below the historical average (1876): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6	IncompleteTests Tests for this run (3) are below the historical average (3180): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn	IncompleteTests Tests for this run (3) are below the historical average (3399): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi	IncompleteTests Tests for this run (2) are below the historical average (3216): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn	IncompleteTests Tests for this run (12) are below the historical average (1970): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt · 2025-10-10T21:11:04Z

Job Failure Risk Analysis for sha: d8aad48

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi	IncompleteTests Tests for this run (3) are below the historical average (894): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips	IncompleteTests Tests for this run (3) are below the historical average (3006): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift	IncompleteTests Tests for this run (3) are below the historical average (1671): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial	IncompleteTests Tests for this run (3) are below the historical average (748): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2	IncompleteTests Tests for this run (2) are below the historical average (1603): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	IncompleteTests Tests for this run (3) are below the historical average (1570): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi	IncompleteTests Tests for this run (3) are below the historical average (1043): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn	IncompleteTests Tests for this run (2) are below the historical average (3349): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade	IncompleteTests Tests for this run (3) are below the historical average (1871): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6	IncompleteTests Tests for this run (3) are below the historical average (3173): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn	IncompleteTests Tests for this run (3) are below the historical average (3390): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi	IncompleteTests Tests for this run (3) are below the historical average (3212): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn	IncompleteTests Tests for this run (12) are below the historical average (1984): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

clobrano

I left a couple of comments

test/extended/two_node/common.go

test/extended/two_node/tnf_kubelet_disruption.go

openshift-trt · 2025-10-14T12:12:26Z

Job Failure Risk Analysis for sha: 967520e

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi	IncompleteTests Tests for this run (2) are below the historical average (1023): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips	IncompleteTests Tests for this run (3) are below the historical average (3021): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift	IncompleteTests Tests for this run (3) are below the historical average (1679): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial	IncompleteTests Tests for this run (3) are below the historical average (877): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2	IncompleteTests Tests for this run (2) are below the historical average (1694): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	IncompleteTests Tests for this run (3) are below the historical average (1644): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi	IncompleteTests Tests for this run (2) are below the historical average (1132): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn	IncompleteTests Tests for this run (2) are below the historical average (3382): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade	IncompleteTests Tests for this run (3) are below the historical average (1917): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6	IncompleteTests Tests for this run (2) are below the historical average (3139): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn	IncompleteTests Tests for this run (2) are below the historical average (3434): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi	IncompleteTests Tests for this run (3) are below the historical average (3273): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn	IncompleteTests Tests for this run (12) are below the historical average (2292): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt · 2025-10-15T18:14:27Z

Job Failure Risk Analysis for sha: 4ce24e9

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi	IncompleteTests Tests for this run (3) are below the historical average (1128): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips	IncompleteTests Tests for this run (3) are below the historical average (3061): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift	IncompleteTests Tests for this run (2) are below the historical average (1679): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial	IncompleteTests Tests for this run (3) are below the historical average (874): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2	IncompleteTests Tests for this run (3) are below the historical average (1857): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	IncompleteTests Tests for this run (3) are below the historical average (1796): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi	IncompleteTests Tests for this run (3) are below the historical average (1174): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn	IncompleteTests Tests for this run (2) are below the historical average (3358): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade	IncompleteTests Tests for this run (3) are below the historical average (1935): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6	IncompleteTests Tests for this run (3) are below the historical average (3183): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn	IncompleteTests Tests for this run (3) are below the historical average (3429): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi	IncompleteTests Tests for this run (3) are below the historical average (3266): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn	IncompleteTests Tests for this run (12) are below the historical average (2603): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

test/extended/two_node/tnf_kubelet_disruption.go

jaypoulz

Minor notes. I'm good to merge as is, once it rebased and passes tests.

openshift-trt · 2025-10-17T23:44:34Z

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 612eb9d

Job Name	New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2	Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] server supports sending resources in Table format [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2	Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by metadata client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2	Medium - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2	High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] reflector doesn't support receiving resources as Tables [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by client-go's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by dynamic client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by informers when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by metadatainformer when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2	High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.

New tests seen in this PR at sha: 612eb9d

"[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] reflector doesn't support receiving resources as Tables [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
"[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] server supports sending resources in Table format [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
"[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by client-go's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
"[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by dynamic client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
"[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by metadata client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
"[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by informers when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
"[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by metadatainformer when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
"[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
"[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
"[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
"[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
"[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
"[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]

jaypoulz · 2025-10-20T13:31:42Z

/retest-required

openshift-ci · 2025-12-02T15:51:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhensel-rh, jaypoulz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~test/extended/two_node/OWNERS~~ [jaypoulz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dhensel-rh · 2025-12-04T13:49:44Z

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

openshift-ci · 2025-12-04T13:50:28Z

@dhensel-rh: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1764d5f0-d118-11f0-9f02-aa3e2e13ee2f-0

openshift-ci-robot · 2025-12-04T14:13:02Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

eggfoobar

Just took an initial walk through, looking good so far, just some code duplication we can tighten up a bit. I'll give it another look Monday.

Highnotes,

Lets use the utils.GetNodes method
Lets keep the state of the nodes in the tests
Lets use the openshift-etcd namespace to run the debug pods for now or create our own privileged namespace

eggfoobar · 2025-12-05T21:26:07Z

test/extended/two_node/utils/common.go

 	return false
 }

+// HasNodeRebooted checks if a node has rebooted by comparing its current BootID with a previous snapshot.


Seems like nothing changed here, let's move this back to avoid a diff.

If HasNodeRebooted is removed here, it will negatively impact
https://github.com/openshift/origin/blob/main/test/extended/two_node/tnf_recovery.go#L454-L458

Not sure why this is showing up like this. It is already checked in

https://github.com/openshift/origin/blob/main/test/extended/two_node/utils/common.go#L110-L122

eggfoobar · 2025-12-05T21:26:56Z

test/extended/two_node/utils/common.go

 const (
 	AllNodes                  = ""                                      // No label filter for GetNodes
 	LabelNodeRoleControlPlane = "node-role.kubernetes.io/control-plane" // Control plane node label
+	LabelNodeRoleMaster       = "node-role.kubernetes.io/master"        // Legacy master node label


Let's remove this, at this point all labels should use control-plane and I don't see where this is being used.

eggfoobar · 2025-12-05T21:31:39Z

test/extended/two_node/tnf_kubelet_disruption.go

+			return utils.LogEtcdClusterStatus(oc, "BeforeEach validation")
+		}, etcdOperatorIsHealthyTimeout, pollInterval).ShouldNot(o.HaveOccurred(), "etcd cluster should be fully healthy before starting test")
+
+		nodeList, err := oc.AdminKubeClient().CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})


Let's use the helper function utils.GetNodes(oc, utils.AllNodes) here and everywhere we do node queries to simplify things

Actually, lets avoid running this query in the beforeEach statement, just run it in the test itself, and we avoid any possible mistakes that can arise from sharing the nodelist between ginkonode runs. This should also help with the fact that the node state is changing enough where we get the fresh state at the test run time.

eggfoobar · 2025-12-05T21:36:30Z

test/extended/two_node/tnf_kubelet_disruption.go

+
+		g.By("Ensuring both nodes are healthy before starting kubelet disruption test")
+		for _, node := range nodes {
+			o.Eventually(func() bool {


We don't need to make any call out for the individual nodes here, we already have the node object at this point, so we can just check nodeReady state

for _, node := range nodes { if ready := nodeutil.IsNodeReady(node); !ready { o.Expect(read).Should(o.BeTrue(), fmt.Sprintf("Node %s should be ready before kubelet disruption", node.Name) } }

eggfoobar · 2025-12-05T21:40:32Z

test/extended/two_node/tnf_kubelet_disruption.go

+	defer g.GinkgoRecover()
+
+	var (
+		oc                = util.NewCLIWithoutNamespace("").AsAdmin()


Here lets use util.NewCLIWithoutNamespace("").SetNamespace("openshift-etcd").AsAdmin() to avoid permission issues and use the namespace where the tnf pods run in.

openshift-ci-robot · 2025-12-08T02:10:58Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

openshift-ci-robot · 2025-12-08T18:57:48Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

dhensel-rh · 2025-12-08T19:33:59Z

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

openshift-ci · 2025-12-08T19:34:03Z

@dhensel-rh: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d8b78e30-d46c-11f0-90b8-6c3725405946-0

openshift-ci-robot · 2025-12-08T20:02:12Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

eggfoobar

So far so good, got a few suggestions, and just some logging changes, we have a bit too much logging repeating the same information, that's going to make it noisy to debug.

Lets remove duplicating the information during the return statements and remove the success and start logs in the beginning of the helper functions since the call site seems to also log that the helper function is being called.

for example:
this

	if err != nil {
		framework.Logf("Failed to check resource status: %v, output: %s", err, output)
		return false, fmt.Errorf("failed to check resource status: %v", err)
	}

to this

	if err != nil {
		return false, fmt.Errorf("failed to check resource status, output: %s error: %v", output, err)
	}

eggfoobar · 2025-12-09T04:52:52Z

test/extended/two_node/tnf_kubelet_disruption.go

+	g.AfterEach(func() {
+		// Cleanup: Remove any resource bans that may have been created during the test
+		// This ensures the device under test is in the same state the test started in
+		nodeList, err := oc.AdminKubeClient().CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})


Lets use the util function GetNodes(oc, AllNodes)

eggfoobar · 2025-12-09T04:53:57Z

test/extended/two_node/tnf_kubelet_disruption.go

+		// This ensures the device under test is in the same state the test started in
+		nodeList, err := oc.AdminKubeClient().CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})
+		if err != nil {
+			framework.Logf("Warning: Failed to retrieve nodes during cleanup: %v", err)


What happens if this fails to clean up? Would that be considered s failed state?

eggfoobar · 2025-12-09T05:07:46Z

test/extended/two_node/tnf_kubelet_disruption.go

+			return
+		}
+
+		if len(nodeList.Items) == 2 {


If we don't get two nodes, would that be a failure?

eggfoobar · 2025-12-09T05:15:09Z

test/extended/two_node/tnf_kubelet_disruption.go

+	})
+
+	g.It("Should recover from single node kubelet service disruption", func() {
+		nodeList, err := oc.AdminKubeClient().CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})


lets use the util function here utils.GetNodes(oc, utils.AllNodes)

eggfoobar · 2025-12-09T05:17:45Z

test/extended/two_node/tnf_kubelet_disruption.go

+	})
+
+	g.It("Should properly stop kubelet service and verify automatic restart on target node", func() {
+		nodeList, err := oc.AdminKubeClient().CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})


Same thing here about the util function

eggfoobar · 2025-12-09T05:19:15Z

test/extended/two_node/utils/common.go

 const (
 	AllNodes                  = ""                                      // No label filter for GetNodes
-	LabelNodeRoleControlPlane = "node-role.kubernetes.io/control-plane" // Control plane node label
+	LabelNodeRoleControlPlane = "node-role.kubernetes.io/control-plane" // Control plane node label     // Legacy master node label


lets remove this comment about legacy, seems like a left over from the previous line

eggfoobar · 2025-12-09T05:21:19Z

test/extended/two_node/utils/common.go

+//
+//	err := AddConstraint(oc, "master-0", "kubelet-clone", "master-1")
+func AddConstraint(oc *exutil.CLI, nodeName string, resourceName string, targetNode string) error {
+	framework.Logf("Banning resource %s from running on %s (temporary ban for testing)", resourceName, targetNode)


Lets remove this, the call site should make mention of this, the helper functions should be clean of log statements, typically logging opinions should be left to the caller

eggfoobar · 2025-12-09T05:21:47Z

test/extended/two_node/utils/common.go

+		"--", "chroot", "/host", "bash", "-c", cmd).Output()
+
+	if err != nil {
+		framework.Logf("Failed to ban resource: %v, output: %s", err, output)


Same thing here, the error should describe why it failed and the caller can decide how to log it

eggfoobar · 2025-12-09T05:22:15Z

test/extended/two_node/utils/common.go

+		return fmt.Errorf("failed to ban resource: %v", err)
+	}
+
+	framework.Logf("Successfully banned resource %s from %s", resourceName, targetNode)


Same thing here, lets nix this, it just ends up causing noise in the logs

eggfoobar · 2025-12-09T05:26:48Z

test/extended/two_node/utils/common.go

 }
+
+// isPodReady checks if a pod is ready based on its conditions
+func isPodReady(pod *corev1.Pod) bool {


Lets remove this and use podutils.IsPodReady(pod) from the "k8s.io/kubectl/pkg/util/podutils" package

dhensel-rh · 2025-12-10T20:52:34Z

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

openshift-ci · 2025-12-10T20:52:40Z

@dhensel-rh: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/278ceed0-d60a-11f0-964c-0b149fa05ac5-0

openshift-ci-robot · 2025-12-10T21:12:29Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

openshift-ci · 2025-12-11T00:47:41Z

@dhensel-rh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-metal-ipi-serial-1of2	`7d6c813`	link	false	`/test e2e-metal-ipi-serial-1of2`
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway	`7d6c813`	link	false	`/test e2e-metal-ipi-ovn-dualstack-local-gateway`
ci/prow/e2e-gcp-ovn-techpreview-serial-1of2	`7d6c813`	link	false	`/test e2e-gcp-ovn-techpreview-serial-1of2`
ci/prow/e2e-gcp-ovn-techpreview-serial-2of2	`7d6c813`	link	false	`/test e2e-gcp-ovn-techpreview-serial-2of2`
ci/prow/e2e-metal-ipi-ovn-dualstack	`7d6c813`	link	false	`/test e2e-metal-ipi-ovn-dualstack`
ci/prow/e2e-aws-proxy	`7d6c813`	link	false	`/test e2e-aws-proxy`
ci/prow/e2e-aws-disruptive	`7d6c813`	link	false	`/test e2e-aws-disruptive`
ci/prow/e2e-hypershift-conformance	`7d6c813`	link	false	`/test e2e-hypershift-conformance`
ci/prow/e2e-aws-ovn-upgrade	`7d6c813`	link	false	`/test e2e-aws-ovn-upgrade`
ci/prow/e2e-aws-ovn	`7d6c813`	link	false	`/test e2e-aws-ovn`
ci/prow/e2e-aws-ovn-kube-apiserver-rollout	`7d6c813`	link	false	`/test e2e-aws-ovn-kube-apiserver-rollout`
ci/prow/e2e-gcp-ovn-techpreview	`7d6c813`	link	false	`/test e2e-gcp-ovn-techpreview`
ci/prow/e2e-metal-ipi-serial-ovn-ipv6-2of2	`7d6c813`	link	false	`/test e2e-metal-ipi-serial-ovn-ipv6-2of2`
ci/prow/e2e-metal-ipi-serial-ovn-ipv6-1of2	`7d6c813`	link	false	`/test e2e-metal-ipi-serial-ovn-ipv6-1of2`
ci/prow/e2e-metal-ipi-serial-2of2	`7d6c813`	link	false	`/test e2e-metal-ipi-serial-2of2`
ci/prow/e2e-metal-ipi-virtualmedia	`7d6c813`	link	false	`/test e2e-metal-ipi-virtualmedia`
ci/prow/e2e-azure	`7d6c813`	link	false	`/test e2e-azure`
ci/prow/e2e-aws-ovn-single-node	`eab19dc`	link	false	`/test e2e-aws-ovn-single-node`
ci/prow/okd-scos-e2e-aws-ovn	`6ea9bee`	link	false	`/test okd-scos-e2e-aws-ovn`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

dhensel-rh · 2025-12-11T14:32:00Z

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

openshift-ci · 2025-12-11T14:32:17Z

@dhensel-rh: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/27bc3cf0-d69e-11f0-9dd1-df35834b10ac-0

openshift-ci-robot · 2025-12-11T14:49:19Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

eggfoobar

Just small naming change, and an observation on the kubelet restart. Also @jaypoulz we might need some guidance on these tests as it relates to the openshift/two-node suite, we are currently grabing parallel, serial, disruptive, degraded in that bucket and I think we might need to re-evaluate that.

origin/pkg/testsuites/standard_suites.go

Lines 423 to 432 in c86e3cd

    
           	{ 
        
           		Name: "openshift/two-node", 
        
           		Description: templates.LongDesc(` 
        
           		This test suite runs tests to validate two-node. 
        
           		`), 
        
           		Qualifiers: []string{ 
        
           			`name.contains("[Suite:openshift/two-node") || name.contains("[OCPFeatureGate:DualReplica]") || name.contains("[OCPFeatureGate:HighlyAvailableArbiter]")`, 
        
           		}, 
        
           		TestTimeout: 60 * time.Minute, 
        
           	},

eggfoobar · 2025-12-11T19:44:38Z

test/extended/two_node/tnf_kubelet_disruption.go

+		}
+	})
+
+	g.It("Should recover from single node kubelet service disruption", func() {


Lets' lowerase these Shoulds, dont forget that this word is in the middle of the test tile.

"Should" -> "should"

eggfoobar · 2025-12-11T19:44:46Z

test/extended/two_node/tnf_kubelet_disruption.go

+		}, kubeletRestoreTimeout, pollInterval).ShouldNot(o.HaveOccurred(), "Essential cluster operators should be available after kubelet resource ban removal")
+	})
+
+	g.It("Should properly stop kubelet service and verify automatic restart on target node", func() {


Same here
"Should" -> "should"

eggfoobar · 2025-12-11T19:53:14Z

test/extended/two_node/tnf_kubelet_disruption.go

+
+		g.By("Waiting for kubelet service to automatically restart on target node")
+		o.Eventually(func() bool {
+			return utils.IsServiceRunning(oc, targetNode.Name, "kubelet")


I'm not sure this will work, once you stop the kubelet with systemd, does it start back up? If the kubelet is not up I don't think you'll be able to run a debug pod on that node, you'll need to run the ssh framework that Jeremy implemented in order to check and restart the kubelet.

Ideally you would want to crash the kubelet to test this out, that way the always restart stanza of the kubelet.service will kick in. https://github.com/openshift/machine-config-operator/blob/main/templates/master/01-master-kubelet/_base/units/kubelet.service.yaml#L48-L49

openshift-ci-robot · 2025-12-11T22:32:18Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

openshift-ci bot requested review from jaypoulz and qJkee September 22, 2025 19:12

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 25, 2025

dhensel-rh marked this pull request as draft September 25, 2025 21:33

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 25, 2025

dhensel-rh marked this pull request as ready for review October 6, 2025 14:37

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 6, 2025

openshift-ci bot requested a review from sjenning October 6, 2025 14:38

clobrano suggested changes Oct 7, 2025

View reviewed changes

jaypoulz reviewed Oct 8, 2025

View reviewed changes

clobrano suggested changes Oct 14, 2025

View reviewed changes

test/extended/two_node/common.go Outdated Show resolved Hide resolved

test/extended/two_node/tnf_kubelet_disruption.go Outdated Show resolved Hide resolved

test/extended/two_node/tnf_kubelet_disruption.go Outdated Show resolved Hide resolved

jaypoulz reviewed Oct 16, 2025

View reviewed changes

test/extended/two_node/tnf_kubelet_disruption.go Outdated Show resolved Hide resolved

jaypoulz reviewed Oct 16, 2025

View reviewed changes

test/extended/two_node/tnf_kubelet_disruption.go Outdated Show resolved Hide resolved

jaypoulz approved these changes Oct 16, 2025

View reviewed changes

dhensel-rh force-pushed the OCPEDGE-1484 branch from 9e30ab7 to be91544 Compare October 17, 2025 17:50

openshift-ci bot added the vendor-update Touching vendor dir or related files label Oct 17, 2025

dhensel-rh force-pushed the OCPEDGE-1484 branch from be91544 to 60333c2 Compare October 17, 2025 18:02

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 17, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 17, 2025

dhensel-rh added 2 commits December 2, 2025 10:48

removing extra file

8159316

adding file back

31d0519

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2025

removing some constants that are not needed

03a414a

eggfoobar reviewed Dec 5, 2025

View reviewed changes

change code to use ban instead of contraintId

9688854

change to fix kubelet test from immediately failing

acb113a

addressing check nodeReady state from Egli

7d371c7

eggfoobar reviewed Dec 9, 2025

View reviewed changes

Adding changes from Egli feedback and improving logging

41ac7c2

changing how I interact with nodes

692751b

eggfoobar reviewed Dec 11, 2025

View reviewed changes

cleaning up redundant logging

dce6692

	{
	Name: "openshift/two-node",
	Description: templates.LongDesc(`
	This test suite runs tests to validate two-node.
	`),
	Qualifiers: []string{
	`name.contains("[Suite:openshift/two-node") \|\| name.contains("[OCPFeatureGate:DualReplica]") \|\| name.contains("[OCPFeatureGate:HighlyAvailableArbiter]")`,
	},
	TestTimeout: 60 * time.Minute,
	},

OCPEDGE-1484: [TNF] kubelet disruption test #30290

Are you sure you want to change the base?

OCPEDGE-1484: [TNF] kubelet disruption test #30290

Conversation

dhensel-rh commented Sep 22, 2025

Uh oh!

openshift-trt bot commented Sep 22, 2025

Uh oh!

openshift-trt bot commented Sep 25, 2025

Uh oh!

openshift-trt bot commented Sep 25, 2025

Uh oh!

openshift-trt bot commented Oct 3, 2025

Uh oh!

clobrano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-trt bot commented Oct 9, 2025

Uh oh!

openshift-trt bot commented Oct 10, 2025

Uh oh!

openshift-trt bot commented Oct 10, 2025

Uh oh!

clobrano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openshift-trt bot commented Oct 14, 2025

Uh oh!

openshift-trt bot commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

jaypoulz left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-trt bot commented Oct 17, 2025

Uh oh!

jaypoulz commented Oct 20, 2025

Uh oh!

openshift-ci bot commented Dec 2, 2025

Uh oh!

dhensel-rh commented Dec 4, 2025

Uh oh!

openshift-ci bot commented Dec 4, 2025

Uh oh!

openshift-ci-robot commented Dec 4, 2025

Uh oh!

eggfoobar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhensel-rh Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhensel-rh Dec 8, 2025 •

edited

Loading

eggfoobar Dec 5, 2025 •

edited

Loading