Skip to content

Conversation

@openshift-cherrypick-robot

This is an automated cherry-pick of #3212

/assign yuqi-zhang

In 4.11 we moved the drain operation to a centralized controller.
This drain controller doesn't drain the MCC today.

In theory this means the pod can finish and then gracefully terminate.
In practice this is problematic since the pod never gets scheduled
off the node, meaning it thinks its still running (according to the API)
when the master node shuts down, and won't be reachable until the master
node reboots.

For some metal setups the reboot could take up to 30 minutes, this means
that we won't have a MCC for 30 minutes, which would be very problematic
as other pools waiting for drains/updates are stalled as well. We should
instead just drain the controller pod. The next one will immediately
pick up where this one left off, so there shouldn't be any conflicts.
The pod when drained will restart existing operations in ~10 seconds
from testing, which would be much faster.
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2022

@openshift-cherrypick-robot: Bugzilla bug 2103786 has been cloned as Bugzilla bug 2104687. Retitling PR to link against new bug.
/retitle [release-4.11] Bug 2104687: drain controller: don't skip the MCC pod drain

Details

In response to this:

[release-4.11] Bug 2103786: drain controller: don't skip the MCC pod drain

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yuqi-zhang
Copy link
Contributor

Not a blocker for 4.11, but a definite nice-to-have as soon as we can

@openshift-ci openshift-ci bot changed the title [release-4.11] Bug 2103786: drain controller: don't skip the MCC pod drain [release-4.11] Bug 2104687: drain controller: don't skip the MCC pod drain Jul 6, 2022
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jul 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2022

@openshift-cherrypick-robot: This pull request references Bugzilla bug 2104687, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.11.0) matches configured target release for branch (4.11.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 2103786 is in the state MODIFIED, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Bugzilla bug 2103786 targets the "4.12.0" release, which is one of the valid target releases: 4.12.0
  • bug has dependents

Requesting review from QA contact:
/cc @rioliu-rh

Details

In response to this:

[release-4.11] Bug 2104687: drain controller: don't skip the MCC pod drain

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from cgwalters, jkyros and rioliu-rh July 6, 2022 21:36
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 7, 2022

@openshift-cherrypick-robot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-upgrade-single-node 4fc6163 link false /test e2e-aws-upgrade-single-node
ci/prow/e2e-vsphere-upgrade 4fc6163 link false /test e2e-vsphere-upgrade
ci/prow/e2e-aws-disruptive 4fc6163 link false /test e2e-aws-disruptive
ci/prow/e2e-aws-serial 4fc6163 link false /test e2e-aws-serial
ci/prow/e2e-metal-ipi 4fc6163 link false /test e2e-metal-ipi

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@rioliu-rh
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jul 7, 2022
@sinnykumari
Copy link
Contributor

clean backport
/lgtm
/approve
/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Aug 11, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 11, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: openshift-cherrypick-robot, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 11, 2022
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 2 against base HEAD e722bb7 and 8 for PR HEAD 4fc6163 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 1 against base HEAD e722bb7 and 7 for PR HEAD 4fc6163 in total

@openshift-merge-robot openshift-merge-robot merged commit 5a634af into openshift:release-4.11 Aug 11, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2022

@openshift-cherrypick-robot: All pull requests linked via external trackers have merged:

Bugzilla bug 2104687 has been moved to the MODIFIED state.

Details

In response to this:

[release-4.11] Bug 2104687: drain controller: don't skip the MCC pod drain

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants