-
Notifications
You must be signed in to change notification settings - Fork 213
[release-4.11] OCPBUGS-5882: Set upgradeability check throttling period to 2m #885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.11] OCPBUGS-5882: Set upgradeability check throttling period to 2m #885
Conversation
Previously, the throttling reused the `minimumUpdateCheckInterval` value which is derived from the full CVO minimum sync period. This value is set between 2m and 4m at CVO startup. This period is unecessarily long and bad for UX, things happen with a delay and our own testcase expects upgradeability to be propagated in 3 minutes at worst. Hardcode the throttling to 2m (lower bound of previous behavior) to prevent flapping on flurries but allow changes to propagate deterministically faster. We will still get a bit of non-determinisim from sync periods and requeueing, so this change should not cause any periodic API-hammering.
|
@petr-muller: No Bugzilla bug is referenced in the title of this pull request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira cherrypick OCPBUGS-5879 |
|
@petr-muller: Jira Issue OCPBUGS-5879 has been cloned as Jira Issue OCPBUGS-5882. Retitling PR to link against new bug. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@petr-muller: No Bugzilla bug is referenced in the title of this pull request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@petr-muller: This pull request references Jira Issue OCPBUGS-5882, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@petr-muller: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@petr-muller: No Bugzilla bug is referenced in the title of this pull request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@petr-muller: This pull request references Jira Issue OCPBUGS-5882, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
OCPBUGS-5879 is verified |
|
@evakhoni: This pull request references Jira Issue OCPBUGS-5882, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@petr-muller any changes expected to this PR? I can pre-merge verify if definitely not. |
|
@evakhoni nope, no changes expected /payload-aggregate periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade 10 |
|
@petr-muller: trigger 1 job(s) for the /payload-(job|aggregate) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f64c0b50-9aaa-11ed-9aca-6ed2795e5dac-0 |
|
pre-merge verified in comment-21590007 |
|
@bandrade any idea who is available to set backport-risk-assessed on this one? tnx! |
|
/hold I'd like to have a better look into some of the results in #885 (comment) |
|
The PR works as expected but does not actually fully eliminate the flakes in adminack test, there are still unlucky timing cases. Upgradeability checks may still be done once-per-4-minutes (worst case) and the test only waits for three. This does not mean the CVO fix is broken or not necessary: it still improves a chance of prompt reaction, but the actual worst case expectations need to be 4 minute delay, and the test needs to be adjusted for that. /hold cancel |
|
@evakhoni |
|
/label backport-risk-assessed |
LalatenduMohanty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LalatenduMohanty, petr-muller The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
bandrade
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/label cherry-pick-approved
|
@bandrade: Can not set label cherry-pick-approved: Must be member in one of these teams: [] DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@evakhoni unfortunately I can't help here, the qe-approvals members are https://github.com/openshift/release/blob/721527e9122aea33aa225514d5f59a3ef3523291/core-services/prow/02_config/openshift/cluster-version-operator/_pluginconfig.yaml#L7-L9 |
well, Jia would not be back until summer, so we have @jianlinliu as the only allowed_user for now. |
|
@evakhoni perhaps we should add you to that list? I'm happy to merge such PR. |
I think its something we need to discuss with my team first. (although this PR looks perfectly fine to me, speaking generally of being a cherry pick approver) |
|
/label cherry-pick-approved |
|
@evakhoni it is okay for me to add you as a qe-approvals member |
|
@petr-muller: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-5882 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Previously, the throttling reused the
minimumUpdateCheckIntervalvaluewhich is derived from the full CVO minimum sync period. This value is
set between 2m and 4m at CVO startup. This period is unecessarily long
and bad for UX, things happen with a delay and our own testcase expects
upgradeability to be propagated in 3 minutes at worst.
Hardcode the throttling to 2m (lower bound of previous behavior) to
prevent flapping on flurries but allow changes to propagate
deterministically faster. We will still get a bit of non-determinisim
from sync periods and requeueing, so this change should not cause any
periodic API-hammering.
This is a partial backport of #882. The
release-4.11branch does not contain the fast-mode-with-failing-precondition code from #808 so the refactoring commit from #882 is not applicable nor needed.