Skip to content

Conversation

@Miciah
Copy link
Contributor

@Miciah Miciah commented Jun 9, 2025

gatewayapicontroller: Add checks for empty slices

Check whether the slice of parent resource references in an httproute's status is empty before indexing the slice.

Before this commit, the "Ensure HTTPRoute object is created" test sometimes panicked with "runtime error: index out of range [0] with length 0".

Similarly, check whether the slice of load-balancer ingress points in a service's status is empty before indexing it.

gatewayapicontroller: Clean up resources when done

Delete the gatewayclass and uninstall OSSM after all the Gateway API controller tests are done.

Before this change, the Gateway API controller tests left OSSM installed, including the subscription, CSV, installplan, bundled CRDs, RBAC resources, deployment, service, serviceaccount, etc., when the tests were finished. This clutter could cause problems for other tests, or for the same test if it was run again.

The new cleanup logic uses the OperatorsV1 client from github.com/operator-framework/operator-lifecycle-manager. Importing this package requires a replace stanza for openshift/api in go.mod.

This vendors github.com/operator-framework/operator-lifecycle-manager v0.30.1-0.20250114164243-1b6752ec65fa rather than the newest revision in order to avoid bringing in additional problematic vendor bumps that the newest revision would bring in.

gatewayapicontroller: Always log errors

Add the error value to some log messages that were missing it.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 9, 2025
@openshift-ci-robot
Copy link

@Miciah: This pull request references Jira Issue OCPBUGS-56281, which is invalid:

  • expected the bug to target the "4.20.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

gatewayapicontroller: Add checks for empty slices

Check whether the slice of parent resource references in an httproute's status is empty before indexing the slice.

Before this commit, the "Ensure HTTPRoute object is created" test sometimes panicked with "runtime error: index out of range [0] with length 0".

Similarly, check whether the slice of load-balancer ingress points in a service's status is empty before indexing it.

gatewayapicontroller: Clean up resources when done

Delete the gatewayclass and uninstall OSSM after all the Gateway API controller tests are done.

Before this change, the Gateway API controller tests left OSSM installed, including the subscription, CSV, installplan, bundled CRDs, RBAC resources, deployment, service, serviceaccount, etc., when the tests were finished. This clutter could cause problems for other tests, or for the same test if it was run again.

The new cleanup logic uses the OperatorsV1 client from github.com/operator-framework/operator-lifecycle-manager. Importing this package requires a replace stanza for openshift/api in go.mod.

This vendors github.com/operator-framework/operator-lifecycle-manager v0.30.1-0.20250114164243-1b6752ec65fa rather than the newest revision in order to avoid bringing in additional problematic vendor bumps that the newest revision would bring in.

gatewayapicontroller: Always log errors

Add the error value to some log messages that were missing it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from knobunc and p0lyn0mial June 9, 2025 13:29
@openshift-ci openshift-ci bot added the vendor-update Touching vendor dir or related files label Jun 9, 2025
@Miciah Miciah force-pushed the OCPBUGS-56281-gatewayapicontroller-clean-up-resources-when-done branch from fc08232 to bf853bf Compare June 9, 2025 16:11
@openshift-trt
Copy link

openshift-trt bot commented Jun 9, 2025

Job Failure Risk Analysis for sha: bf853bf

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (19) are below the historical average (1505): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-1of2 IncompleteTests
Tests for this run (19) are below the historical average (1822): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout IncompleteTests
Tests for this run (29) are below the historical average (1778): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@rhamini3
Copy link
Contributor

LGTM, @melvinjoseph86 PTAL

@melvinjoseph86
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 12, 2025
@melvinjoseph86
Copy link
Contributor

/retest

@openshift-trt
Copy link

openshift-trt bot commented Jun 12, 2025

Job Failure Risk Analysis for sha: 1967dd2

Job Name Failure Risk
pull-ci-openshift-origin-main-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback IncompleteTests
Tests for this run (94) are below the historical average (209): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones High
[sig-network-edge][OCPFeatureGate:GatewayAPIController][Feature:Router][apigroup:gateway.networking.k8s.io] Ensure custom gatewayclass can be accepted [Suite:openshift/conformance/parallel]
This test has passed 98.38% of 2463 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade IncompleteTests
Tests for this run (196) are below the historical average (3374): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
[CI] e2e-openstack-ovn-etcd-scaling job permanent fails at many openshift-test tests
etcd-scaling jobs failing ~60% of the time
---
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling Medium
[sig-instrumentation] disruption/metrics-api connection/new should be available throughout the test
Potential external regression detected for High Risk Test analysis

@Miciah Miciah force-pushed the OCPBUGS-56281-gatewayapicontroller-clean-up-resources-when-done branch from 1967dd2 to ab81b79 Compare June 13, 2025 01:53
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 13, 2025
@openshift-trt
Copy link

openshift-trt bot commented Jun 13, 2025

Job Failure Risk Analysis for sha: ab81b79

Job Name Failure Risk
pull-ci-openshift-origin-main-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback MissingData
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones High
[sig-network-edge][OCPFeatureGate:GatewayAPIController][Feature:Router][apigroup:gateway.networking.k8s.io] Ensure custom gatewayclass can be accepted [Suite:openshift/conformance/parallel]
This test has passed 99.76% of 2503 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade IncompleteTests
Tests for this run (2125) are below the historical average (3401): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@Miciah Miciah force-pushed the OCPBUGS-56281-gatewayapicontroller-clean-up-resources-when-done branch from ab81b79 to 1dcc98a Compare June 13, 2025 15:51
@Miciah
Copy link
Contributor Author

Miciah commented Jun 13, 2025

https://github.com/openshift/origin/compare/1967dd22c83963e780eb9953bc38da760e090dc8..1dcc98a3c2ec7c38dcee818e750e14ce57d70892 made these changes:

  • Add logic to delete the Istio CR in the test cleanup.
  • Declare package consts for istioName and ingressNamespace and use these instead of function-local variables and string literals.
  • Omit the namespace when getting the Istio CR, which is cluster-scoped.

Before these changes, pods.json from e2e-aws #1932229162710339584 had the istiod pod. After these changes, pods.json from e2e-aws #1933552902287134720 does not have the istiod pod. It appears that the istiod pod cleanup is working properly.

Also, comparing must-gather.tar from 1933552902287134720 and must-gather.tar from 1932229162710339584, the older must-gather archive has the istiorevisions.sailoperator.io.yaml CRD whereas the newer must-gather archive does not. Neither must-gather archive has any other istio.io or sailoperator.io CRDs. I believe that deleting the Istio CR enables the cleanup to delete all OSSM-installed CRDs successfully.

@openshift-trt
Copy link

openshift-trt bot commented Jun 13, 2025

Job Failure Risk Analysis for sha: 1dcc98a

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn High
[sig-network-edge][OCPFeatureGate:GatewayAPIController][Feature:Router][apigroup:gateway.networking.k8s.io] Ensure HTTPRoute object is created [Suite:openshift/conformance/parallel]
This test has passed 99.22% of 2451 runs on release 4.20 [Overall] in the last week.

Open Bugs
Component Readiness: [Networking / router] [OCPFeatureGate:GatewayAPIController] test regressed on HyperShift Azure AKS
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift High
[sig-api-machinery] API priority and fairness should ensure that requests can be classified by adding FlowSchema and PriorityLevelConfiguration [Suite:openshift/conformance/parallel] [Suite:k8s]
This test has passed 99.97% of 3060 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade IncompleteTests
Tests for this run (2125) are below the historical average (3318): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-rt-upgrade IncompleteTests
Tests for this run (19) are below the historical average (1620): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
---
[sig-api-machinery] disruption/cache-openshift-api apiserver/openshift-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[sig-api-machinery] disruption/cache-oauth-api apiserver/oauth-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.

@abhat
Copy link
Contributor

abhat commented Jun 16, 2025

/payload-aggregate periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade 5

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 16, 2025

@abhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/101e8ee0-4acb-11f0-928a-4bd1c2be89d0-0

e2e.Failf("Failed to delete GatewayClass %q", gatewayClassName)
}

g.By("Deleting the OSSM Operator resources")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, why we don't use an owner reference for Subscription? We could owner reference the gatewayclass and let Kube do the cascading deletion.

Upd: Deletion of Subscription doesn't delete CSV or CRDs. The CRD part is understandable: there can be some data loss. But CSV is kinda interesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few reasons not to put or rely on an owner reference on the subscription:

  • You could create the subscription manually; we cannot assume that the operator created it.
  • You could have multiple gatewayclasses with our controller name, and then it isn't clear how we would configure the owner references on the subscription. Would we add only the first gatewayclass with our controller name? Would we add all gatewayclasses with our controller name? If we added more than one owner reference, would we need to delete old owner references when the corresponding gatewayclasses were deleted? If we did delete stale owner references, would that prevent garbage collection, or would we always leave one non-stale reference to trigger garbage collection?
  • I don't know for sure that OLM doesn't look at the owner reference. We would need to check this.
  • I am not confident that an owner reference would cause the subscription to be deleted as the owner reference on the Istio CR didn't cause it to be deleted (see OCPBUGS-56281: gatewayapicontroller: Clean up resources when done #29900 (comment)).
  • Deleting the Istio CR only requires changing the test, it is more explicit than relying on garbage collection, and it is more obviously safe to backport.

Comment on lines 116 to 126
g.By("Deleting the Istio CR")

o.Expect(oc.AsAdmin().Run("delete").Args("--ignore-not-found=true", "istio", istioName).Execute()).Should(o.Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Istio CR is supposed to be garbage collected since its owner reference is gatewayclass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The owner reference on the Istio CR didn't cause it to be deleted (see #29900 (comment)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The owner reference on the Istio CR didn't cause it to be deleted

I didn't manage to reproduce this behavior. I saw Istio CR gets deleted after GatewayClass:

$ oc get gc
NAME                CONTROLLER                           ACCEPTED   AGE
openshift-default   openshift.io/gateway-controller/v1   True       4m12s

04:57:08 $ oc get istio
NAME                REVISIONS   READY   IN USE   ACTIVE REVISION     STATUS    VERSION   AGE
openshift-gateway   1           1       0        openshift-gateway   Healthy   v1.24.3   4m18s

04:57:14 $ oc get istio openshift-gateway -o yaml | yq .metadata.ownerReferences[0]
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
name: openshift-default
uid: 3f6ef6ed-9e6b-4821-9706-221ff0bca83e

04:57:34 $ oc -n openshift-ingress get pods
NAME                                        READY   STATUS    RESTARTS      AGE
istiod-openshift-gateway-7b567bc8b4-z9972   1/1     Running   0             4m48s
router-default-76c4888886-fmtzq             1/1     Running   0             77m
router-default-76c4888886-nm9mb             1/1     Running   2 (78m ago)   89m

04:57:52 $ oc delete gc openshift-default
gatewayclass.gateway.networking.k8s.io "openshift-default" deleted

04:58:07 $ oc get istio
No resources found

04:58:14 $ oc -n openshift-ingress get pods
NAME                              READY   STATUS    RESTARTS      AGE
router-default-76c4888886-fmtzq   1/1     Running   0             78m
router-default-76c4888886-nm9mb   1/1     Running   2 (78m ago)   89m

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--ignore-not-found=true will prevent the delete from failing if GC has already deleted the object. I'll add a code comment that the delete might be superfluous but it's there just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 161 to 200
if err != nil && strings.Contains(err.Error(), "not found") {
e2e.Logf("Subscription %q not found; retrying...", expectedSubscriptionName)
return false, nil
}
Copy link
Contributor

@alebedev87 alebedev87 Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should be consistent among all the polls we do in this block. I personally prefer how it's done for the OSSM deployment below:

		if err != nil {
				e2e.Logf("Failed to get OSSM operator deployment %q: %v; retrying...", deploymentOSSMName, err)
				return false, nil
			}

No assertions, just a retry for any error until the timeout is triggered. I think that some errors (not only "Not Found") can be temporary or intermittent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to keep my changes more narrowly focused. All right, I can make the polling loop for the subscription retry on all errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah Miciah force-pushed the OCPBUGS-56281-gatewayapicontroller-clean-up-resources-when-done branch from 1dcc98a to 38d8018 Compare June 17, 2025 07:48
@Thealisyed
Copy link

LGTM, holding off for @alebedev87 comments

@openshift-trt
Copy link

openshift-trt bot commented Jun 17, 2025

Job Failure Risk Analysis for sha: 38d8018

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-gcp-csi IncompleteTests
Tests for this run (19) are below the historical average (1374): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (19) are below the historical average (1140): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-1of2 IncompleteTests
Tests for this run (18) are below the historical average (1403): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-2of2 IncompleteTests
Tests for this run (19) are below the historical average (1430): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn IncompleteTests
Tests for this run (19) are below the historical average (1146): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling IncompleteTests
Tests for this run (19) are below the historical average (1343): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-rt-upgrade IncompleteTests
Tests for this run (19) are below the historical average (1315): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (19) are below the historical average (810): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@Miciah
Copy link
Contributor Author

Miciah commented Jun 19, 2025

The aggregated jobs each failed while buliding the tests-openshift.origin-amd64 image, with the error message, "Error: Unable to find a match: python3-cinderclient" (missing RPM package). I'll retry in case it was glitch with the Yum repository.

/payload-aggregate periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade 5

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 19, 2025

@Miciah: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/aad73360-4cbf-11f0-9efa-6a57a5235fed-0

@Miciah
Copy link
Contributor Author

Miciah commented Jun 19, 2025

This time all the aggregated jobs failed to build the image with the erorr message, "Error: Unable to find a match: realtime-tests rteval".

/payload-aggregate periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade 5

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 19, 2025

@Miciah: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c0124e40-4d46-11f0-9623-e230cc269dc8-0

@openshift-trt
Copy link

openshift-trt bot commented Sep 27, 2025

Job Failure Risk Analysis for sha: 573f478

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-openstack-ovn IncompleteTests
Tests for this run (25) are below the historical average (2158): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

2 similar comments
@openshift-trt
Copy link

openshift-trt bot commented Sep 27, 2025

Job Failure Risk Analysis for sha: 573f478

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-openstack-ovn IncompleteTests
Tests for this run (25) are below the historical average (2158): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt
Copy link

openshift-trt bot commented Sep 27, 2025

Job Failure Risk Analysis for sha: 573f478

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-openstack-ovn IncompleteTests
Tests for this run (25) are below the historical average (2158): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-ci-robot
Copy link

/hold

Revision 573f478 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 27, 2025
@openshift-trt
Copy link

openshift-trt bot commented Sep 27, 2025

Job Failure Risk Analysis for sha: 573f478

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-openstack-ovn IncompleteTests
Tests for this run (25) are below the historical average (2158): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

1 similar comment
@openshift-trt
Copy link

openshift-trt bot commented Sep 27, 2025

Job Failure Risk Analysis for sha: 573f478

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-openstack-ovn IncompleteTests
Tests for this run (25) are below the historical average (2158): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt
Copy link

openshift-trt bot commented Oct 3, 2025

Job Failure Risk Analysis for sha: 573f478

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi IncompleteTests
Tests for this run (25) are below the historical average (647): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-openstack-ovn IncompleteTests
Tests for this run (25) are below the historical average (2510): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt
Copy link

openshift-trt bot commented Oct 3, 2025

Job Failure Risk Analysis for sha: 573f478

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-csi IncompleteTests
Tests for this run (25) are below the historical average (650): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-openstack-ovn IncompleteTests
Tests for this run (25) are below the historical average (2511): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@Miciah
Copy link
Contributor Author

Miciah commented Oct 9, 2025

ci/prow/e2e-aws-csi failed because the ELB for the API couldn't be provisioned:

 level=warning msg=Condition LoadBalancerReady has status: "False", reason: "LoadBalancerFailed", message: "[failed to create load balancer: &{ ci-op-q0v7ynrg-fa9df-jrlq9-int  internal [us-east-1b us-east-1c] [subnet-053f58058957bb86d subnet-0172b82d80a4edf01] [sg-070dc061ecda0d221] [] <nil> {0s false} map[Name:ci-op-q0v7ynrg-fa9df-jrlq9-int ci-nat-replace:false clusterName:ci-op-q0v7ynrg-fa9df expirationDate:2025-10-04T01:40+00:00 kubernetes.io/cluster/ci-op-q0v7ynrg-fa9df-jrlq9:owned sigs.k8s.io/cluster-api-provider-aws/cluster/ci-op-q0v7ynrg-fa9df-jrlq9:owned sigs.k8s.io/cluster-api-provider-aws/role:apiserver] [{TCP 6443 {apiserver-target-glr46 6443 TCP vpc-0b675550d3492bc9e 0xc0020dbe40}} {TCP 22623 {additional-listener-zg5s5 22623 TCP vpc-0b675550d3492bc9e 0xc0020dbec0}}] map[load_balancing.cross_zone.enabled:0xc001e988e0] }: TooManyLoadBalancers: The maximum number of load balancers has been reached\n\tstatus code: 400, request id: 517fe5fe-2681-4432-b07e-663d811a6795, failed to create load balancer: &{ ci-op-q0v7ynrg-fa9df-jrlq9-ext  internet-facing [us-east-1b us-east-1c] [subnet-03d5f8f851461e359 subnet-078294bb369f0a906] [sg-070dc061ecda0d221] [] <nil> {0s false} map[Name:ci-op-q0v7ynrg-fa9df-jrlq9-ext ci-nat-replace:false clusterName:ci-op-q0v7ynrg-fa9df expirationDate:2025-10-04T01:40+00:00 kubernetes.io/cluster/ci-op-q0v7ynrg-fa9df-jrlq9:owned sigs.k8s.io/cluster-api-provider-aws/cluster/ci-op-q0v7ynrg-fa9df-jrlq9:owned sigs.k8s.io/cluster-api-provider-aws/role:apiserver] [{TCP 6443 {apiserver-target-54nxx 6443 TCP vpc-0b675550d3492bc9e 0xc0027e0180}}] map[load_balancing.cross_zone.enabled:0xc0017fd2a0] }: Throttling: Rate exceeded\n\tstatus code: 400, request id: b3e29601-215b-4051-8ee5-a2682d7dc9e3]" 

"TooManyLoadBalancers" suggests the CI account hit a quota.

/test e2e-aws-csi

e2e-openstack-ovn failed because the installer reported, "failed to provision control-plane machines within 15m0s". I wasn't able to find any further details in the CI artifacts.

/test e2e-openstack-ovn

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 9, 2025
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 816619b and 2 for PR HEAD 573f478 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 889c2dd and 1 for PR HEAD 573f478 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 7343864 and 0 for PR HEAD 573f478 in total

@openshift-ci-robot
Copy link

/hold

Revision 573f478 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 10, 2025
@Miciah
Copy link
Contributor Author

Miciah commented Oct 10, 2025

None of the failing jobs are required, so I don't understand the reason for the hold.

The e2e-vsphere-ovn has "unknown" in the "Required" column, so maybe that is confusing openshift-ci-robot?

/skip

@Miciah
Copy link
Contributor Author

Miciah commented Oct 10, 2025

e2e-vsphere-ovn failed on several test:

  • [sig-network][OCPFeatureGate:RouteExternalCertificate][Feature:Router][apigroup:route.openshift.io] with valid setup the router should support external certificate and the secret is deleted and re-created again but RBAC permissions are dropped then routes are not reachable; I have filed OCPBUGS-62929 for this issue.

  • [sig-network] Services should be rejected for evicted pods (no endpoints exist) failed; OCPBUGS-57665 is tracking failures for this test.

  • [sig-network][Feature:EgressFirewall] when using openshift ovn-kubernetes should ensure egressfirewall is created failed; I have filed OCPBUGS-62930 to track failures and flakes for this test.

  • [sig-auth][Feature:OAuthServer] [Token Expiration] Using a OAuth client with a non-default token max age [apigroup:oauth.openshift.io] to generate tokens that expire shortly works as expected when using a code authorization flow; I have filed OCPBUGS-62931 to track failures and flakes for this test.

  • [sig-auth][Feature:ProjectAPI] TestProjectWatchWithSelectionPredicate should succeed failed; I have filed OCPBUGS-62932 to track failures and flakes for this test.

  • [sig-api-machinery][Feature:ResourceQuota] Object count should properly count the number of imagestreams resources; I have filed OCPBUGS-62933 to track failures and flakes for this test.

I believe all the failures were caused by flaky tests, so I'm going to rerun the CI job.

/test e2e-vsphere-ovn

@Miciah
Copy link
Contributor Author

Miciah commented Oct 10, 2025

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 10, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 10, 2025

@Miciah: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback 38d8018 link false /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
ci/prow/okd-e2e-gcp e227127 link false /test okd-e2e-gcp
ci/prow/e2e-aws e227127 link false /test e2e-aws
ci/prow/e2e-aws-ovn-serial-publicnet-1of2 e227127 link false /test e2e-aws-ovn-serial-publicnet-1of2
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 43e6d01 link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-gcp-disruptive 43e6d01 link false /test e2e-gcp-disruptive
ci/prow/e2e-vsphere-ovn-etcd-scaling 43e6d01 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-etcd-scaling 43e6d01 link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-openstack-serial 43e6d01 link false /test e2e-openstack-serial
ci/prow/e2e-gcp-ovn-etcd-scaling 43e6d01 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-azure-ovn-upgrade 43e6d01 link false /test e2e-azure-ovn-upgrade
ci/prow/e2e-gcp-fips-serial-1of2 43e6d01 link false /test e2e-gcp-fips-serial-1of2
ci/prow/e2e-gcp-fips-serial-2of2 43e6d01 link false /test e2e-gcp-fips-serial-2of2
ci/prow/e2e-azure-ovn-etcd-scaling 43e6d01 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-metal-ipi-virtualmedia cb24406 link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-aws-ovn cb24406 link false /test e2e-aws-ovn
ci/prow/e2e-aws-disruptive cb24406 link false /test e2e-aws-disruptive
ci/prow/e2e-gcp-ovn-techpreview cb24406 link false /test e2e-gcp-ovn-techpreview
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway cb24406 link false /test e2e-metal-ipi-ovn-dualstack-local-gateway
ci/prow/e2e-metal-ipi-serial-2of2 cb24406 link false /test e2e-metal-ipi-serial-2of2
ci/prow/e2e-metal-ipi-ovn 573f478 link false /test e2e-metal-ipi-ovn
ci/prow/e2e-metal-ipi-ovn-kube-apiserver-rollout 573f478 link false /test e2e-metal-ipi-ovn-kube-apiserver-rollout
ci/prow/e2e-aws-ovn-cgroupsv2 573f478 link false /test e2e-aws-ovn-cgroupsv2
ci/prow/e2e-aws-ovn-single-node-serial 573f478 link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-aws-ovn-single-node-upgrade 573f478 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-edge-zones 573f478 link false /test e2e-aws-ovn-edge-zones
ci/prow/e2e-agnostic-ovn-cmd 573f478 link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-openstack-ovn 573f478 link false /test e2e-openstack-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 7a0ace8 and 2 for PR HEAD 573f478 in total

@openshift-merge-bot openshift-merge-bot bot merged commit 11d30d9 into openshift:main Oct 11, 2025
30 checks passed
@openshift-ci-robot
Copy link

@Miciah: Jira Issue Verification Checks: Jira Issue OCPBUGS-56281
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-56281 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

In response to this:

gatewayapicontroller: Add checks for empty slices

Check whether the slice of parent resource references in an httproute's status is empty before indexing the slice.

Before this commit, the "Ensure HTTPRoute object is created" test sometimes panicked with "runtime error: index out of range [0] with length 0".

Similarly, check whether the slice of load-balancer ingress points in a service's status is empty before indexing it.

gatewayapicontroller: Clean up resources when done

Delete the gatewayclass and uninstall OSSM after all the Gateway API controller tests are done.

Before this change, the Gateway API controller tests left OSSM installed, including the subscription, CSV, installplan, bundled CRDs, RBAC resources, deployment, service, serviceaccount, etc., when the tests were finished. This clutter could cause problems for other tests, or for the same test if it was run again.

The new cleanup logic uses the OperatorsV1 client from github.com/operator-framework/operator-lifecycle-manager. Importing this package requires a replace stanza for openshift/api in go.mod.

This vendors github.com/operator-framework/operator-lifecycle-manager v0.30.1-0.20250114164243-1b6752ec65fa rather than the newest revision in order to avoid bringing in additional problematic vendor bumps that the newest revision would bring in.

gatewayapicontroller: Always log errors

Add the error value to some log messages that were missing it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

tmshort added a commit to tmshort/openshift-origin that referenced this pull request Oct 17, 2025
This reverts the inclusion of the olm depdendency in openshift#29900

Signed-off-by: Todd Short <[email protected]>
@jupierce
Copy link
Contributor

/cherry-pick release-4.20

@openshift-cherrypick-robot

@jupierce: #29900 failed to apply on top of branch "release-4.20":

Applying: gatewayapicontroller: Add checks for empty slices
Applying: gatewayapicontroller: Clean up resources when done
Using index info to reconstruct a base tree...
M	go.mod
M	go.sum
M	test/extended/router/gatewayapicontroller.go
M	vendor/modules.txt
Falling back to patching base and 3-way merge...
Auto-merging vendor/modules.txt
CONFLICT (content): Merge conflict in vendor/modules.txt
Removing vendor/github.com/fsnotify/fsnotify/mkdoc.zsh
Removing vendor/github.com/fsnotify/fsnotify/.gitattributes
Removing vendor/github.com/fsnotify/fsnotify/.editorconfig
Auto-merging test/extended/router/gatewayapicontroller.go
Auto-merging go.sum
Auto-merging go.mod
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0002 gatewayapicontroller: Clean up resources when done

In response to this:

/cherry-pick release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.21.0-0.nightly-2025-10-22-123727

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. vendor-update Touching vendor dir or related files verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.