-
Notifications
You must be signed in to change notification settings - Fork 462
[release-4.20] OCPBUGS-65777: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default #5387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Skipping CI for Draft Pull Request. |
2507d91 to
e8e4d53
Compare
|
/test all |
e8e4d53 to
c9f9e79
Compare
|
@ngopalak-redhat: This pull request references OCPNODE-3718 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.z" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test unit |
|
@ngopalak-redhat: This pull request references OCPNODE-3718 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.z" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test bootstrap-unit |
a958166 to
b12addd
Compare
|
/test all |
|
@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@haircommander @sairameshv Can you please review? |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ngopalak-redhat, sairameshv, umohnani8 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/skip |
|
@haircommander: This pull request references OCPNODE-3722 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target only the "4.20.z" version, but multiple target versions were set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
/jira refresh |
|
@haircommander: This pull request references OCPNODE-3722 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@haircommander: This pull request references OCPNODE-3722 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retitle [release-4.20] OCPBUGS-65777: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default |
|
@haircommander: This pull request references OCPNODE-3722 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ngopalak-redhat: This pull request references Jira Issue OCPBUGS-65777, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@haircommander: This pull request references Jira Issue OCPBUGS-65777, which is valid. 7 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
1 similar comment
|
/retest-required |
|
@ngopalak-redhat: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/verified later @asahay19
|
|
@ngopalak-redhat: This PR has been marked to be verified later by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
0a8e123
into
openshift:release-4.20
|
@ngopalak-redhat: Jira Issue OCPBUGS-65777: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged: All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-65777 has not been moved to the MODIFIED state. This PR is marked as verified-later. Jira issue(s) in the title of this PR will require post-merge verification. After testing, it must be manually moved to the DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Fixes: #OCPNODE-3722
- What I did
This patch introduces the
50-master-auto-sizing-disabledMachineConfig to OpenShift 4.20 clusters, setting theNODE_SIZING_ENABLEDflag to false by default on master and worker nodes.This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.
Summary of changes
Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.
Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (Set minimum version of 4.20 required to upgrade to 4.21 cincinnati-graph-data#8277) will enforce that this patch must be present before the upgrade path to 4.21 is started.
User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.
Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390
Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.
Rejected Alternatives
We explored several alternative solutions, but they were not feasible:
In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.
Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.
Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.
Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.
- How to verify it
Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via
oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.
User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).
Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.
- Description for the changelog
Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade