Skip to content

Conversation

@jubittajohn
Copy link
Contributor

@jubittajohn jubittajohn commented Oct 25, 2024

The following test covers a vertical scaling scenario when a member is unhealthy and another scenario when kubelet is not working on a node.

First test validates that scale down happens before scale up if the deleted member is unhealthy.CPMS is disabled to observe that scale-down happens first in this case.

  1. If the CPMS is active, first disable it by deleting the CPMS custom resource.
  2. Remove the static pod manifest from a node and stop the kubelet on the node. This makes the member unhealthy.
  3. Delete the machine hosting the node in step 2.
  4. Verify the member removal and the total voting member count of 2 to ensure scale-down happens first when a member is unhealthy.
  5. Restore the initial cluster state by creating a new machine(scale-up) and re-enabling CPMS.

The second test covers a vertical scaling scenario when kubelet is not working on a node.
This test validates that deleting the machine hosting the node where the kubelet is stopped doesn't get stuck when CPMS is enabled. The case in this bug: https://issues.redhat.com/browse/OCPBUGS-17199
CPMS should be active for this test scenario

  1. Stop the kubelet on a node.
  2. Delete the machine hosting the node in step 2.
  3. That should prompt the ControlPlaneMachineSetOperator(CPMSO) to create a replacement machine and node for that machine index.
  4. The operator will first scale-up the new machine's member.
  5. Then scale-down the machine that is pending deletion by removing its member and deletion hook.

@jubittajohn
Copy link
Contributor Author

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@openshift-ci openshift-ci bot requested review from orenc1 and tjungblu October 25, 2024 20:45
@jubittajohn jubittajohn force-pushed the vertical-scaling-kubelet-stopped branch from 7e1de9a to ee92b18 Compare October 25, 2024 20:46
@jubittajohn
Copy link
Contributor Author

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@jubittajohn jubittajohn force-pushed the vertical-scaling-kubelet-stopped branch from ee92b18 to 78840ba Compare October 31, 2024 15:42
@jubittajohn
Copy link
Contributor Author

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 78840ba

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-azure-ovn-etcd-scaling High
[sig-node] Managed cluster should verify that nodes have no unexpected reboots [Late] [Suite:openshift/conformance/parallel]
This test has passed 100.00% of 3 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-azure-ovn-etcd-scaling'] in the last 14 days.
---
[sig-arch][Late][Jira:"kube-apiserver"] collect certificate data [Suite:openshift/conformance/parallel]
This test has passed 100.00% of 3 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-azure-ovn-etcd-scaling'] in the last 14 days.
---
[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to vertically scale up and down when CPMS is disabled [apigroup:machine.openshift.io]
This test has passed 100.00% of 1 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-azure-ovn-etcd-scaling'] in the last 14 days.
---
[Jira:"Network / ovn-kubernetes"] monitor test pod-network-avalibility collection
This test has passed 100.00% of 3 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-azure-ovn-etcd-scaling'] in the last 14 days.
---
Showing 4 of 14 test results
pull-ci-openshift-origin-master-e2e-aws-ovn-etcd-scaling High
[Jira:"Network / ovn-kubernetes"] monitor test pod-network-avalibility cleanup
This test has passed 100.00% of 4 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-etcd-scaling'] in the last 14 days.
---
[Jira:"Node / Kubelet"] monitor test kubelet-log-collector collection
This test has passed 100.00% of 4 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-etcd-scaling'] in the last 14 days.
---
[sig-node][invariant] alert/TargetDown should not be at or above info in ns/kube-system
This test has passed 100.00% of 4 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-etcd-scaling'] in the last 14 days.
---
[Jira:"Network / ovn-kubernetes"] monitor test pod-network-avalibility collection
This test has passed 100.00% of 4 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-etcd-scaling'] in the last 14 days.
---
Showing 4 of 13 test results

@jubittajohn jubittajohn changed the title E2E to vertically scale up and down when kubelet is not running on a node ETCD-674: WIP: E2E to vertically scale up and down when kubelet is not running on a node Nov 4, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 4, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 4, 2024

@jubittajohn: This pull request references ETCD-674 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jubittajohn jubittajohn force-pushed the vertical-scaling-kubelet-stopped branch from 106d577 to 7c1093b Compare November 4, 2024 17:05
@jubittajohn
Copy link
Contributor Author

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

1 similar comment
@jubittajohn
Copy link
Contributor Author

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 7c1093b

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-azure-ovn-etcd-scaling High
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Available
This test has passed 100.00% of 4 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-azure-ovn-etcd-scaling'] in the last 14 days.
---
[sig-node][invariant] alert/TargetDown should not be at or above info in ns/kube-system
This test has passed 100.00% of 4 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-azure-ovn-etcd-scaling'] in the last 14 days.
pull-ci-openshift-origin-master-e2e-aws-ovn-etcd-scaling High
[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully extended [Suite:openshift/conformance/parallel]
This test has passed 100.00% of 2 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-etcd-scaling'] in the last 14 days.
---
[sig-arch][Late][Jira:"kube-apiserver"] collect certificate data [Suite:openshift/conformance/parallel]
This test has passed 100.00% of 3 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-etcd-scaling'] in the last 14 days.

@jubittajohn jubittajohn force-pushed the vertical-scaling-kubelet-stopped branch from 7c1093b to da81f2a Compare November 12, 2024 15:50
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 12, 2024

@jubittajohn: This pull request references ETCD-674 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

The following test covers a vertical scaling scenario when kubelet is not working on a node.
This test validates that deleting the machine hosting the node where the kubelet is stopped doesn't get stuck when CPMS is enabled.

CPMS should be active for this test scenario

  1. Stop the kubelet on a node
  2. Delete the machine hosting the node in step 2.
  3. That should prompt the ControlPlaneMachineSetOperator(CPMSO) to create a replacement machine and node for that machine index
  4. The operator will first scale-up the new machine's member
  5. Then scale-down the machine that is pending deletion by removing its member and deletion hook

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jubittajohn jubittajohn changed the title ETCD-674: WIP: E2E to vertically scale up and down when kubelet is not running on a node ETCD-674: E2E to vertically scale up and down when kubelet is not running on a node Nov 12, 2024
@jubittajohn
Copy link
Contributor Author

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 12, 2024

@jubittajohn: This pull request references ETCD-674 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

The following test covers a vertical scaling scenario when kubelet is not working on a node.
This test validates that deleting the machine hosting the node where the kubelet is stopped doesn't get stuck when CPMS is enabled. The case in this bug: https://issues.redhat.com/browse/OCPBUGS-17199

CPMS should be active for this test scenario

  1. Stop the kubelet on a node
  2. Delete the machine hosting the node in step 2.
  3. That should prompt the ControlPlaneMachineSetOperator(CPMSO) to create a replacement machine and node for that machine index
  4. The operator will first scale-up the new machine's member
  5. Then scale-down the machine that is pending deletion by removing its member and deletion hook

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

return false, nil
}

return podReadyCondition.Status == corev1.ConditionFalse, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to be a bit careful here, kubelet is the only one updating the status - if you shut it down this condition may never become true. I would just try to fire and forget this pod and wait for the node to become not ready.


// step 1: stop the kubelet on a node
framework.Logf("Stopping the kubelet on the node %s", etcdTargetNode.Name)
err = scalingtestinglibrary.StopKubelet(ctx, oc.AdminKubeClient(), *etcdTargetNode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid the pointer deref here, just pass it down into the function and error out if the node is nil

Comment on lines 322 to 394
// step 2: delete the machine on which kubelet is stopped to trigger the CPMSO to create a new one to replace it
machineToDelete, err := scalingtestinglibrary.NodeNameToMachineName(ctx, kubeClient, machineClient, etcdTargetNode.Name)
err = errors.Wrapf(err, "failed to get the machine name for the NotReady node: %s", etcdTargetNode.Name)
o.Expect(err).ToNot(o.HaveOccurred())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you need the helper. How about you choose the machine you want to stop kubelet with and then just get the the node via the status reference? that should save you a ton of code

err = errors.Wrap(err, "scale-down: timed out waiting for APIServer pods to stabilize on the same revision")
o.Expect(err).ToNot(o.HaveOccurred())

// step 5: verify member and machine counts go back down to 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love those assertions below, maybe have that as a separate function? could there be some reuse in other tests?

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: da81f2a

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-vsphere-ovn-etcd-scaling High
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Available
This test has passed 100.00% of 4 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-vsphere-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.17-e2e-vsphere-ovn-etcd-scaling'] in the last 14 days.
pull-ci-openshift-origin-master-e2e-metal-ipi-ovn IncompleteTests
Tests for this run (101) are below the historical average (2543): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-kube-apiserver-rollout Low
[Conformance][Suite:openshift/kube-apiserver/rollout][Jira:"kube-apiserver"][sig-kube-apiserver] kube-apiserver should roll out new revisions without disruption [apigroup:config.openshift.io][apigroup:operator.openshift.io]
This test has passed 69.23% of 13 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-kube-apiserver-rollout'] in the last 14 days.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 13, 2024

@jubittajohn: This pull request references ETCD-674 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

The following test covers a vertical scaling scenario when a member is unhealthy and another scenario when kubelet is not working on a node.

First test validates that scale down happens before scale up if the deleted member is unhealthy.CPMS is disabled to observe that scale-down happens first in this case.

  1. If the CPMS is active, first disable it by deleting the CPMS custom resource.
  2. Remove the static pod manifest from a node and stop the kubelet on the node. This makes the member unhealthy.
  3. Delete the machine hosting the node in step 2.
  4. Verify the member removal and the total voting member count of 2 to ensure scale-down happens first when a member is unhealthy.
  5. Restore the initial cluster state by creating a new machine(scale-up) and re-enabling CPMS.

The second test covers a vertical scaling scenario when kubelet is not working on a node.
This test validates that deleting the machine hosting the node where the kubelet is stopped doesn't get stuck when CPMS is enabled. The case in this bug: https://issues.redhat.com/browse/OCPBUGS-17199
CPMS should be active for this test scenario

  1. Stop the kubelet on a node.
  2. Delete the machine hosting the node in step 2.
  3. That should prompt the ControlPlaneMachineSetOperator(CPMSO) to create a replacement machine and node for that machine index.
  4. The operator will first scale-up the new machine's member.
  5. Then scale-down the machine that is pending deletion by removing its member and deletion hook.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jubittajohn
Copy link
Contributor Author

/test e2e-aws-ovn-etcd-scaling

@jubittajohn
Copy link
Contributor Author

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@jubittajohn
Copy link
Contributor Author

/test e2e-azure-ovn-etcd-scaling

1 similar comment
@jubittajohn
Copy link
Contributor Author

/test e2e-azure-ovn-etcd-scaling

@openshift-trt
Copy link

openshift-trt bot commented Nov 26, 2024

Job Failure Risk Analysis for sha: 1f87a9e

Job Name Failure Risk
pull-ci-openshift-origin-master-okd-scos-e2e-aws-ovn High
[sig-arch] Only known images used by tests
This test has passed 100.00% of 18 runs on jobs ['periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn'] in the last 14 days.
pull-ci-openshift-origin-master-e2e-aws-ovn-etcd-scaling High
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Available
This test has passed 100.00% of 3 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-etcd-scaling' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling'] in the last 14 days.
pull-ci-openshift-origin-master-e2e-azure-ovn-etcd-scaling IncompleteTests
Tests for this run (106) are below the historical average (984): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial IncompleteTests
Tests for this run (26) are below the historical average (1350): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-kube-apiserver-rollout Low
[sig-arch][Late] operators should not create watch channels very often [apigroup:apiserver.openshift.io] [Suite:openshift/conformance/parallel]
This test has passed 76.19% of 21 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-kube-apiserver-rollout' 'periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-kube-apiserver-rollout'] in the last 14 days.

Open Bugs
Component Readiness: operators should not create watch channels very often

@jubittajohn jubittajohn force-pushed the vertical-scaling-kubelet-stopped branch from 1f87a9e to 4001331 Compare January 6, 2025 14:51
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 6, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 6, 2025

New changes are detected. LGTM label has been removed.

@jubittajohn
Copy link
Contributor Author

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 6, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jubittajohn, tjungblu
Once this PR has been reviewed and has the lgtm label, please assign hasbro17 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jubittajohn
Copy link
Contributor Author

/retest-required

@openshift-trt
Copy link

openshift-trt bot commented Jan 6, 2025

Job Failure Risk Analysis for sha: 4001331

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-etcd-scaling High
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Available
This test has passed 100.00% of 14 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling] in the last 14 days.
pull-ci-openshift-origin-master-e2e-aws-ovn-kube-apiserver-rollout Low
[Conformance][Suite:openshift/kube-apiserver/rollout][Jira:"kube-apiserver"][sig-kube-apiserver] kube-apiserver should roll out new revisions without disruption [apigroup:config.openshift.io][apigroup:operator.openshift.io]
This test has passed 60.00% of 5 runs on release 4.19 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

@openshift-trt
Copy link

openshift-trt bot commented Feb 13, 2025

Job Failure Risk Analysis for sha: 4001331

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-etcd-scaling High
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Available
This test has passed 100.00% of 1 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling] in the last 14 days.
pull-ci-openshift-origin-master-e2e-aws-ovn-kube-apiserver-rollout Low
[Conformance][Suite:openshift/kube-apiserver/rollout][Jira:"kube-apiserver"][sig-kube-apiserver] kube-apiserver should roll out new revisions without disruption [apigroup:config.openshift.io][apigroup:operator.openshift.io]
This test has passed 50.00% of 6 runs on release 4.19 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

1 similar comment
@openshift-trt
Copy link

openshift-trt bot commented Feb 13, 2025

Job Failure Risk Analysis for sha: 4001331

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-etcd-scaling High
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Available
This test has passed 100.00% of 1 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling] in the last 14 days.
pull-ci-openshift-origin-master-e2e-aws-ovn-kube-apiserver-rollout Low
[Conformance][Suite:openshift/kube-apiserver/rollout][Jira:"kube-apiserver"][sig-kube-apiserver] kube-apiserver should roll out new revisions without disruption [apigroup:config.openshift.io][apigroup:operator.openshift.io]
This test has passed 50.00% of 6 runs on release 4.19 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

… node and when an unhealthy member is present

Signed-off-by: jubittajohn <[email protected]>
@jubittajohn jubittajohn force-pushed the vertical-scaling-kubelet-stopped branch from 4001331 to f9506e0 Compare February 19, 2025 20:19
@jubittajohn
Copy link
Contributor Author

/test e2e-aws-ovn-etcd-scaling
/test e2e-gcp-ovn-etcd-scaling
/test e2e-azure-ovn-etcd-scaling
/test e2e-vsphere-ovn-etcd-scaling

@openshift-trt
Copy link

openshift-trt bot commented Feb 20, 2025

Job Failure Risk Analysis for sha: f9506e0

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-kube-apiserver-rollout Low
[Conformance][Suite:openshift/kube-apiserver/rollout][Jira:"kube-apiserver"][sig-kube-apiserver] kube-apiserver should roll out new revisions without disruption [apigroup:config.openshift.io][apigroup:operator.openshift.io]
This test has passed 50.00% of 6 runs on release 4.19 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

@openshift-trt
Copy link

openshift-trt bot commented Feb 20, 2025

Job Failure Risk Analysis for sha: f9506e0

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-gcp-ovn-etcd-scaling High
[sig-node] static pods should start after being created
This test has passed 100.00% of 1 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
[sig-architecture] platform pods in ns/openshift-etcd should not exit an excessive amount of times
This test has passed 100.00% of 1 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
[sig-etcd] etcd leader changes are not excessive [Late] [Suite:openshift/conformance/parallel]
This test has passed 100.00% of 1 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
[sig-node] node-lifecycle detects unexpected not ready node
This test has passed 100.00% of 1 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
Showing 4 of 6 test results
pull-ci-openshift-origin-master-e2e-aws-ovn-kube-apiserver-rollout Low
[Conformance][Suite:openshift/kube-apiserver/rollout][Jira:"kube-apiserver"][sig-kube-apiserver] kube-apiserver should roll out new revisions without disruption [apigroup:config.openshift.io][apigroup:operator.openshift.io]
This test has passed 50.00% of 6 runs on release 4.19 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

1 similar comment
@openshift-trt
Copy link

openshift-trt bot commented Feb 20, 2025

Job Failure Risk Analysis for sha: f9506e0

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-gcp-ovn-etcd-scaling High
[sig-node] static pods should start after being created
This test has passed 100.00% of 1 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
[sig-architecture] platform pods in ns/openshift-etcd should not exit an excessive amount of times
This test has passed 100.00% of 1 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
[sig-etcd] etcd leader changes are not excessive [Late] [Suite:openshift/conformance/parallel]
This test has passed 100.00% of 1 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
[sig-node] node-lifecycle detects unexpected not ready node
This test has passed 100.00% of 1 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
Showing 4 of 6 test results
pull-ci-openshift-origin-master-e2e-aws-ovn-kube-apiserver-rollout Low
[Conformance][Suite:openshift/kube-apiserver/rollout][Jira:"kube-apiserver"][sig-kube-apiserver] kube-apiserver should roll out new revisions without disruption [apigroup:config.openshift.io][apigroup:operator.openshift.io]
This test has passed 50.00% of 6 runs on release 4.19 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

@jubittajohn
Copy link
Contributor Author

/test e2e-aws-ovn-etcd-scaling

@jubittajohn
Copy link
Contributor Author

/test e2e-azure-ovn-etcd-scaling

@openshift-trt
Copy link

openshift-trt bot commented Feb 25, 2025

Job Failure Risk Analysis for sha: f9506e0

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-gcp-ovn-etcd-scaling High
[sig-architecture] platform pods in ns/openshift-etcd should not exit an excessive amount of times
This test has passed 100.00% of 2 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
[sig-etcd] etcd leader changes are not excessive [Late] [Suite:openshift/conformance/parallel]
This test has passed 100.00% of 2 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Available
This test has passed 100.00% of 2 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
pull-ci-openshift-origin-master-e2e-aws-ovn-kube-apiserver-rollout Low
[Conformance][Suite:openshift/kube-apiserver/rollout][Jira:"kube-apiserver"][sig-kube-apiserver] kube-apiserver should roll out new revisions without disruption [apigroup:config.openshift.io][apigroup:operator.openshift.io]
This test has passed 57.14% of 7 runs on release 4.19 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

1 similar comment
@openshift-trt
Copy link

openshift-trt bot commented Feb 25, 2025

Job Failure Risk Analysis for sha: f9506e0

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-gcp-ovn-etcd-scaling High
[sig-architecture] platform pods in ns/openshift-etcd should not exit an excessive amount of times
This test has passed 100.00% of 2 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
[sig-etcd] etcd leader changes are not excessive [Late] [Suite:openshift/conformance/parallel]
This test has passed 100.00% of 2 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
---
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Available
This test has passed 100.00% of 2 runs on jobs [periodic-ci-openshift-release-master-nightly-4.19-e2e-gcp-ovn-etcd-scaling periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling] in the last 14 days.
pull-ci-openshift-origin-master-e2e-aws-ovn-kube-apiserver-rollout Low
[Conformance][Suite:openshift/kube-apiserver/rollout][Jira:"kube-apiserver"][sig-kube-apiserver] kube-apiserver should roll out new revisions without disruption [apigroup:config.openshift.io][apigroup:operator.openshift.io]
This test has passed 57.14% of 7 runs on release 4.19 [Architecture:amd64 FeatureSet:default Installer:ipi Network:ovn NetworkStack:ipv4 Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

@dusk125
Copy link
Contributor

dusk125 commented Jul 24, 2025

/test e2e-aws-ovn-etcd-scaling

@openshift-trt
Copy link

openshift-trt bot commented Aug 26, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: f9506e0

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Medium - "[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to vertically scale down when a member is unhealthy [apigroup:machine.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Medium - "[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to vertically scale up and down when kubelet is not running on a node[apigroup:machine.openshift.io]" is a new test, and was only seen in one job.

New tests seen in this PR at sha: f9506e0

  • "[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to vertically scale down when a member is unhealthy [apigroup:machine.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to vertically scale up and down when kubelet is not running on a node[apigroup:machine.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 3, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-trt
Copy link

openshift-trt bot commented Oct 3, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: f9506e0

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Medium - "[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to vertically scale down when a member is unhealthy [apigroup:machine.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Medium - "[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to vertically scale up and down when kubelet is not running on a node[apigroup:machine.openshift.io]" is a new test, and was only seen in one job.

New tests seen in this PR at sha: f9506e0

  • "[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to vertically scale down when a member is unhealthy [apigroup:machine.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to vertically scale up and down when kubelet is not running on a node[apigroup:machine.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 18, 2025

@jubittajohn: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn-etcd-scaling f9506e0 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-openstack-ovn f9506e0 link false /test e2e-openstack-ovn
ci/prow/e2e-vsphere-ovn-etcd-scaling f9506e0 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-kube-apiserver-rollout f9506e0 link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-metal-ipi-ovn f9506e0 link false /test e2e-metal-ipi-ovn
ci/prow/e2e-azure-ovn-etcd-scaling f9506e0 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-edge-zones f9506e0 link true /test e2e-aws-ovn-edge-zones
ci/prow/e2e-vsphere-ovn-upi f9506e0 link true /test e2e-vsphere-ovn-upi
ci/prow/e2e-aws-ovn-serial-1of2 f9506e0 link true /test e2e-aws-ovn-serial-1of2
ci/prow/e2e-aws-ovn-etcd-scaling f9506e0 link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-aws-csi f9506e0 link true /test e2e-aws-csi
ci/prow/e2e-gcp-csi f9506e0 link true /test e2e-gcp-csi
ci/prow/go-verify-deps f9506e0 link true /test go-verify-deps
ci/prow/e2e-aws-ovn-microshift f9506e0 link true /test e2e-aws-ovn-microshift
ci/prow/e2e-aws-ovn-microshift-serial f9506e0 link true /test e2e-aws-ovn-microshift-serial
ci/prow/e2e-metal-ipi-ovn-ipv6 f9506e0 link true /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants