Skip to content

Conversation

@dhensel-rh
Copy link

These tests check what happens when kubelet becomes unavailable.

@openshift-ci openshift-ci bot requested review from jaypoulz and qJkee September 22, 2025 19:12
@openshift-trt
Copy link

openshift-trt bot commented Sep 22, 2025

Job Failure Risk Analysis for sha: 7d6c813

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-vsphere-ovn IncompleteTests
Tests for this run (19) are below the historical average (3346): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi IncompleteTests
Tests for this run (19) are below the historical average (3132): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 25, 2025
@openshift-trt
Copy link

openshift-trt bot commented Sep 25, 2025

Job Failure Risk Analysis for sha: 15714f2

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-agnostic-ovn-cmd IncompleteTests
Tests for this run (3) are below the historical average (1641): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-csi IncompleteTests
Tests for this run (3) are below the historical average (1665): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2 IncompleteTests
Tests for this run (2) are below the historical average (2382): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones IncompleteTests
Tests for this run (2) are below the historical average (2499): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips IncompleteTests
Tests for this run (2) are below the historical average (2581): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift IncompleteTests
Tests for this run (3) are below the historical average (1285): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial IncompleteTests
Tests for this run (2) are below the historical average (655): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 IncompleteTests
Tests for this run (3) are below the historical average (1583): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (2) are below the historical average (1547): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node IncompleteTests
Tests for this run (3) are below the historical average (2300): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-serial IncompleteTests
Tests for this run (3) are below the historical average (1782): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade IncompleteTests
Tests for this run (2) are below the historical average (4031): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi IncompleteTests
Tests for this run (2) are below the historical average (1733): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn IncompleteTests
Tests for this run (2) are below the historical average (3188): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-rt-upgrade IncompleteTests
Tests for this run (2) are below the historical average (1796): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (3) are below the historical average (1867): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn IncompleteTests
Tests for this run (3) are below the historical average (2755): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (2) are below the historical average (2842): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout IncompleteTests
Tests for this run (3) are below the historical average (1562): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-openstack-ovn IncompleteTests
Tests for this run (2) are below the historical average (1872): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Showing 20 of 23 jobs analysis

@dhensel-rh dhensel-rh marked this pull request as draft September 25, 2025 21:33
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 25, 2025
@openshift-trt
Copy link

openshift-trt bot commented Sep 25, 2025

Job Failure Risk Analysis for sha: eab19dc

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi IncompleteTests
Tests for this run (3) are below the historical average (1642): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips IncompleteTests
Tests for this run (3) are below the historical average (2580): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node IncompleteTests
Tests for this run (2) are below the historical average (2270): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt
Copy link

openshift-trt bot commented Oct 3, 2025

Job Failure Risk Analysis for sha: 99e3ea0

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi IncompleteTests
Tests for this run (3) are below the historical average (928): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi IncompleteTests
Tests for this run (3) are below the historical average (1222): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@dhensel-rh dhensel-rh marked this pull request as ready for review October 6, 2025 14:37
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 6, 2025
@openshift-ci openshift-ci bot requested a review from sjenning October 6, 2025 14:38
Copy link
Contributor

@clobrano clobrano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. I noticed, also, that there are quite a few functions not used that must be cleaned up or integrated.

return fmt.Errorf("failed to list cluster operators: %v", err)
}

for _, operator := range clusterOperators.Items {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a library function that does this in the disruption test shared libraries.
https://github.com/openshift/origin/pull/30298/files#diff-4ed6c69fb287c32c3a2fd0f6a6f1e31eca45ed3267feb03d4e466a3ee4c7f276R1274-R1275

I think this approach is fine, but I would recommend collecting the unhealthy things into slices and then returning an error that lists all of the unavailable and degraded operators if the length of those slices are > 0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest change does collect unhealthy things now. An example

  [FAILED] Timed out after 60.001s.
   etcd cluster operator should be healthy before starting test
   Unexpected error:
       <*errors.errorString | 0xc0017b9740>: 
       etcd ClusterOperator is not Available: &{Available False 2025-10-13 10:42:14 -0400 EDT EtcdMembers_NoQuorum EtcdMembersAvailable: 1 of 2 members are available, NAME-PENDING-192.168.111.20 has not started}
       {
           s: "etcd ClusterOperator is not Available: &{Available False 2025-10-13 10:42:14 -0400 EDT EtcdMembers_NoQuorum EtcdMembersAvailable: 1 of 2 members are available, NAME-PENDING-192.168.111.20 has not started}",
       }
   occurred
   In [BeforeEach] at: github.com/openshift/origin/test/extended/two_node/tnf_kubelet_disruption.go:42 @ 10/13/25 10:49:59.218
 ------------------------------

I think this meets your request above. Please verify

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Consider moving to common.go

@openshift-trt
Copy link

openshift-trt bot commented Oct 9, 2025

Job Failure Risk Analysis for sha: aa19408

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi IncompleteTests
Tests for this run (3) are below the historical average (814): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips IncompleteTests
Tests for this run (2) are below the historical average (2918): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift IncompleteTests
Tests for this run (3) are below the historical average (1383): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial IncompleteTests
Tests for this run (3) are below the historical average (609): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 IncompleteTests
Tests for this run (3) are below the historical average (1559): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (3) are below the historical average (1536): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi IncompleteTests
Tests for this run (3) are below the historical average (981): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn IncompleteTests
Tests for this run (2) are below the historical average (3312): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (3) are below the historical average (1840): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (2) are below the historical average (3083): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn IncompleteTests
Tests for this run (3) are below the historical average (3300): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi IncompleteTests
Tests for this run (3) are below the historical average (3112): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn IncompleteTests
Tests for this run (12) are below the historical average (1905): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt
Copy link

openshift-trt bot commented Oct 10, 2025

Job Failure Risk Analysis for sha: 19be558

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi IncompleteTests
Tests for this run (3) are below the historical average (877): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips IncompleteTests
Tests for this run (2) are below the historical average (3005): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift IncompleteTests
Tests for this run (3) are below the historical average (1675): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial IncompleteTests
Tests for this run (3) are below the historical average (749): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 IncompleteTests
Tests for this run (3) are below the historical average (1602): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (3) are below the historical average (1576): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi IncompleteTests
Tests for this run (3) are below the historical average (1034): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn IncompleteTests
Tests for this run (2) are below the historical average (3354): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (3) are below the historical average (1876): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (3) are below the historical average (3180): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn IncompleteTests
Tests for this run (3) are below the historical average (3399): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi IncompleteTests
Tests for this run (2) are below the historical average (3216): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn IncompleteTests
Tests for this run (12) are below the historical average (1970): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt
Copy link

openshift-trt bot commented Oct 10, 2025

Job Failure Risk Analysis for sha: d8aad48

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi IncompleteTests
Tests for this run (3) are below the historical average (894): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips IncompleteTests
Tests for this run (3) are below the historical average (3006): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift IncompleteTests
Tests for this run (3) are below the historical average (1671): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial IncompleteTests
Tests for this run (3) are below the historical average (748): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 IncompleteTests
Tests for this run (2) are below the historical average (1603): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (3) are below the historical average (1570): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi IncompleteTests
Tests for this run (3) are below the historical average (1043): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn IncompleteTests
Tests for this run (2) are below the historical average (3349): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (3) are below the historical average (1871): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (3) are below the historical average (3173): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn IncompleteTests
Tests for this run (3) are below the historical average (3390): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi IncompleteTests
Tests for this run (3) are below the historical average (3212): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn IncompleteTests
Tests for this run (12) are below the historical average (1984): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Copy link
Contributor

@clobrano clobrano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments

@openshift-trt
Copy link

openshift-trt bot commented Oct 14, 2025

Job Failure Risk Analysis for sha: 967520e

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi IncompleteTests
Tests for this run (2) are below the historical average (1023): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips IncompleteTests
Tests for this run (3) are below the historical average (3021): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift IncompleteTests
Tests for this run (3) are below the historical average (1679): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial IncompleteTests
Tests for this run (3) are below the historical average (877): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 IncompleteTests
Tests for this run (2) are below the historical average (1694): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (3) are below the historical average (1644): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi IncompleteTests
Tests for this run (2) are below the historical average (1132): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn IncompleteTests
Tests for this run (2) are below the historical average (3382): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (3) are below the historical average (1917): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (2) are below the historical average (3139): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn IncompleteTests
Tests for this run (2) are below the historical average (3434): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi IncompleteTests
Tests for this run (3) are below the historical average (3273): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn IncompleteTests
Tests for this run (12) are below the historical average (2292): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt
Copy link

openshift-trt bot commented Oct 15, 2025

Job Failure Risk Analysis for sha: 4ce24e9

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi IncompleteTests
Tests for this run (3) are below the historical average (1128): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-fips IncompleteTests
Tests for this run (3) are below the historical average (3061): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift IncompleteTests
Tests for this run (2) are below the historical average (1679): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial IncompleteTests
Tests for this run (3) are below the historical average (874): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 IncompleteTests
Tests for this run (3) are below the historical average (1857): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 IncompleteTests
Tests for this run (3) are below the historical average (1796): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-csi IncompleteTests
Tests for this run (3) are below the historical average (1174): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn IncompleteTests
Tests for this run (2) are below the historical average (3358): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (3) are below the historical average (1935): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (3) are below the historical average (3183): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn IncompleteTests
Tests for this run (3) are below the historical average (3429): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi IncompleteTests
Tests for this run (3) are below the historical average (3266): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn IncompleteTests
Tests for this run (12) are below the historical average (2603): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Copy link
Contributor

@jaypoulz jaypoulz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor notes. I'm good to merge as is, once it rebased and passes tests.

@openshift-ci openshift-ci bot added the vendor-update Touching vendor dir or related files label Oct 17, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 17, 2025
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 17, 2025
@openshift-trt
Copy link

openshift-trt bot commented Oct 17, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 612eb9d

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] server supports sending resources in Table format [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by metadata client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] reflector doesn't support receiving resources as Tables [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by client-go's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by dynamic client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by informers when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by metadatainformer when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" is a new test, was only seen in one job, and failed 1 time(s) against the current commit.

New tests seen in this PR at sha: 612eb9d

  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] reflector doesn't support receiving resources as Tables [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] server supports sending resources in Table format [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by client-go's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by dynamic client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by metadata client's List method when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by informers when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should be requested by metadatainformer when WatchListClient is enabled [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod with container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, 1 container with resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Guaranteed QoS pod, no container resources [Suite:openshift/conformance/serial] [Suite:k8s]" [Total: 1, Pass: 0, Fail: 1, Flake: 0]

@jaypoulz
Copy link
Contributor

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 2, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhensel-rh, jaypoulz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2025
@dhensel-rh
Copy link
Author

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 4, 2025

@dhensel-rh: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1764d5f0-d118-11f0-9f02-aa3e2e13ee2f-0

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Copy link
Contributor

@eggfoobar eggfoobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took an initial walk through, looking good so far, just some code duplication we can tighten up a bit. I'll give it another look Monday.

Highnotes,

  • Lets use the utils.GetNodes method
  • Lets keep the state of the nodes in the tests
  • Lets use the openshift-etcd namespace to run the debug pods for now or create our own privileged namespace

return false
}

// HasNodeRebooted checks if a node has rebooted by comparing its current BootID with a previous snapshot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like nothing changed here, let's move this back to avoid a diff.

Copy link
Author

@dhensel-rh dhensel-rh Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const (
AllNodes = "" // No label filter for GetNodes
LabelNodeRoleControlPlane = "node-role.kubernetes.io/control-plane" // Control plane node label
LabelNodeRoleMaster = "node-role.kubernetes.io/master" // Legacy master node label
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this, at this point all labels should use control-plane and I don't see where this is being used.

return utils.LogEtcdClusterStatus(oc, "BeforeEach validation")
}, etcdOperatorIsHealthyTimeout, pollInterval).ShouldNot(o.HaveOccurred(), "etcd cluster should be fully healthy before starting test")

nodeList, err := oc.AdminKubeClient().CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the helper function utils.GetNodes(oc, utils.AllNodes) here and everywhere we do node queries to simplify things

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, lets avoid running this query in the beforeEach statement, just run it in the test itself, and we avoid any possible mistakes that can arise from sharing the nodelist between ginkonode runs. This should also help with the fact that the node state is changing enough where we get the fresh state at the test run time.


g.By("Ensuring both nodes are healthy before starting kubelet disruption test")
for _, node := range nodes {
o.Eventually(func() bool {
Copy link
Contributor

@eggfoobar eggfoobar Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to make any call out for the individual nodes here, we already have the node object at this point, so we can just check nodeReady state

for _, node := range nodes {
       if ready := nodeutil.IsNodeReady(node); !ready {
             o.Expect(read).Should(o.BeTrue(), fmt.Sprintf("Node %s should be ready before kubelet disruption", node.Name)
       }
}

defer g.GinkgoRecover()

var (
oc = util.NewCLIWithoutNamespace("").AsAdmin()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here lets use util.NewCLIWithoutNamespace("").SetNamespace("openshift-etcd").AsAdmin() to avoid permission issues and use the namespace where the tnf pods run in.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@dhensel-rh
Copy link
Author

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 8, 2025

@dhensel-rh: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d8b78e30-d46c-11f0-90b8-6c3725405946-0

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Copy link
Contributor

@eggfoobar eggfoobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far so good, got a few suggestions, and just some logging changes, we have a bit too much logging repeating the same information, that's going to make it noisy to debug.

Lets remove duplicating the information during the return statements and remove the success and start logs in the beginning of the helper functions since the call site seems to also log that the helper function is being called.

for example:
this

	if err != nil {
		framework.Logf("Failed to check resource status: %v, output: %s", err, output)
		return false, fmt.Errorf("failed to check resource status: %v", err)
	}

to this

	if err != nil {
		return false, fmt.Errorf("failed to check resource status, output: %s error: %v", output, err)
	}

g.AfterEach(func() {
// Cleanup: Remove any resource bans that may have been created during the test
// This ensures the device under test is in the same state the test started in
nodeList, err := oc.AdminKubeClient().CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use the util function GetNodes(oc, AllNodes)

// This ensures the device under test is in the same state the test started in
nodeList, err := oc.AdminKubeClient().CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})
if err != nil {
framework.Logf("Warning: Failed to retrieve nodes during cleanup: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this fails to clean up? Would that be considered s failed state?

return
}

if len(nodeList.Items) == 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't get two nodes, would that be a failure?

})

g.It("Should recover from single node kubelet service disruption", func() {
nodeList, err := oc.AdminKubeClient().CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use the util function here utils.GetNodes(oc, utils.AllNodes)

})

g.It("Should properly stop kubelet service and verify automatic restart on target node", func() {
nodeList, err := oc.AdminKubeClient().CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here about the util function

const (
AllNodes = "" // No label filter for GetNodes
LabelNodeRoleControlPlane = "node-role.kubernetes.io/control-plane" // Control plane node label
LabelNodeRoleControlPlane = "node-role.kubernetes.io/control-plane" // Control plane node label // Legacy master node label
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove this comment about legacy, seems like a left over from the previous line

//
// err := AddConstraint(oc, "master-0", "kubelet-clone", "master-1")
func AddConstraint(oc *exutil.CLI, nodeName string, resourceName string, targetNode string) error {
framework.Logf("Banning resource %s from running on %s (temporary ban for testing)", resourceName, targetNode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove this, the call site should make mention of this, the helper functions should be clean of log statements, typically logging opinions should be left to the caller

"--", "chroot", "/host", "bash", "-c", cmd).Output()

if err != nil {
framework.Logf("Failed to ban resource: %v, output: %s", err, output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, the error should describe why it failed and the caller can decide how to log it

return fmt.Errorf("failed to ban resource: %v", err)
}

framework.Logf("Successfully banned resource %s from %s", resourceName, targetNode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, lets nix this, it just ends up causing noise in the logs

}

// isPodReady checks if a pod is ready based on its conditions
func isPodReady(pod *corev1.Pod) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove this and use podutils.IsPodReady(pod) from the "k8s.io/kubectl/pkg/util/podutils" package

@dhensel-rh
Copy link
Author

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 10, 2025

@dhensel-rh: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/278ceed0-d60a-11f0-964c-0b149fa05ac5-0

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 11, 2025

@dhensel-rh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-serial-1of2 7d6c813 link false /test e2e-metal-ipi-serial-1of2
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway 7d6c813 link false /test e2e-metal-ipi-ovn-dualstack-local-gateway
ci/prow/e2e-gcp-ovn-techpreview-serial-1of2 7d6c813 link false /test e2e-gcp-ovn-techpreview-serial-1of2
ci/prow/e2e-gcp-ovn-techpreview-serial-2of2 7d6c813 link false /test e2e-gcp-ovn-techpreview-serial-2of2
ci/prow/e2e-metal-ipi-ovn-dualstack 7d6c813 link false /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-aws-proxy 7d6c813 link false /test e2e-aws-proxy
ci/prow/e2e-aws-disruptive 7d6c813 link false /test e2e-aws-disruptive
ci/prow/e2e-hypershift-conformance 7d6c813 link false /test e2e-hypershift-conformance
ci/prow/e2e-aws-ovn-upgrade 7d6c813 link false /test e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn 7d6c813 link false /test e2e-aws-ovn
ci/prow/e2e-aws-ovn-kube-apiserver-rollout 7d6c813 link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-gcp-ovn-techpreview 7d6c813 link false /test e2e-gcp-ovn-techpreview
ci/prow/e2e-metal-ipi-serial-ovn-ipv6-2of2 7d6c813 link false /test e2e-metal-ipi-serial-ovn-ipv6-2of2
ci/prow/e2e-metal-ipi-serial-ovn-ipv6-1of2 7d6c813 link false /test e2e-metal-ipi-serial-ovn-ipv6-1of2
ci/prow/e2e-metal-ipi-serial-2of2 7d6c813 link false /test e2e-metal-ipi-serial-2of2
ci/prow/e2e-metal-ipi-virtualmedia 7d6c813 link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-azure 7d6c813 link false /test e2e-azure
ci/prow/e2e-aws-ovn-single-node eab19dc link false /test e2e-aws-ovn-single-node
ci/prow/okd-scos-e2e-aws-ovn 6ea9bee link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dhensel-rh
Copy link
Author

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 11, 2025

@dhensel-rh: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/27bc3cf0-d69e-11f0-9dd1-df35834b10ac-0

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Copy link
Contributor

@eggfoobar eggfoobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just small naming change, and an observation on the kubelet restart. Also @jaypoulz we might need some guidance on these tests as it relates to the openshift/two-node suite, we are currently grabing parallel, serial, disruptive, degraded in that bucket and I think we might need to re-evaluate that.

{
Name: "openshift/two-node",
Description: templates.LongDesc(`
This test suite runs tests to validate two-node.
`),
Qualifiers: []string{
`name.contains("[Suite:openshift/two-node") || name.contains("[OCPFeatureGate:DualReplica]") || name.contains("[OCPFeatureGate:HighlyAvailableArbiter]")`,
},
TestTimeout: 60 * time.Minute,
},

}
})

g.It("Should recover from single node kubelet service disruption", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets' lowerase these Shoulds, dont forget that this word is in the middle of the test tile.

"Should" -> "should"

}, kubeletRestoreTimeout, pollInterval).ShouldNot(o.HaveOccurred(), "Essential cluster operators should be available after kubelet resource ban removal")
})

g.It("Should properly stop kubelet service and verify automatic restart on target node", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here
"Should" -> "should"


g.By("Waiting for kubelet service to automatically restart on target node")
o.Eventually(func() bool {
return utils.IsServiceRunning(oc, targetNode.Name, "kubelet")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will work, once you stop the kubelet with systemd, does it start back up? If the kubelet is not up I don't think you'll be able to run a debug pod on that node, you'll need to run the ssh framework that Jeremy implemented in order to check and restart the kubelet.

Ideally you would want to crash the kubelet to test this out, that way the always restart stanza of the kubelet.service will kick in. https://github.com/openshift/machine-config-operator/blob/main/templates/master/01-master-kubelet/_base/units/kubelet.service.yaml#L48-L49

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. e2e-images-update Related to images used by e2e tests jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. vendor-update Touching vendor dir or related files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants