Skip to content

Conversation

@ngopalak-redhat
Copy link
Contributor

@ngopalak-redhat ngopalak-redhat commented Nov 3, 2025

Fixes: #OCPNODE-3722

- What I did
This patch introduces the 50-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (Set minimum version of 4.20 required to upgrade to 4.21 cincinnati-graph-data#8277) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 3, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 3, 2025
@ngopalak-redhat ngopalak-redhat force-pushed the ngopalak/release-4.20-patch-autoconfig branch 2 times, most recently from 2507d91 to e8e4d53 Compare November 7, 2025 04:08
@ngopalak-redhat ngopalak-redhat changed the title WIP: Patch 4.20 to ensure that KubeletConfig with AutoSizingReserved set t… Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default Nov 7, 2025
@ngopalak-redhat
Copy link
Contributor Author

/test all

@ngopalak-redhat ngopalak-redhat force-pushed the ngopalak/release-4.20-patch-autoconfig branch from e8e4d53 to c9f9e79 Compare November 7, 2025 05:11
@ngopalak-redhat ngopalak-redhat changed the title Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default OCPNODE-3718: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default Nov 7, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 7, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 7, 2025

@ngopalak-redhat: This pull request references OCPNODE-3718 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.z" version, but no target version was set.

Details

In response to this:

- What I did
This patch introduces the 01-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (TODO) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ngopalak-redhat
Copy link
Contributor Author

/test unit

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 7, 2025

@ngopalak-redhat: This pull request references OCPNODE-3718 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.z" version, but no target version was set.

Details

In response to this:

- What I did
This patch introduces the 01-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (TODO) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ngopalak-redhat ngopalak-redhat changed the title OCPNODE-3718: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default OCPNODE-3722: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default Nov 7, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 7, 2025

@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue.

Details

In response to this:

- What I did
This patch introduces the 01-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (TODO) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 7, 2025

@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue.

Details

In response to this:

Fixes: #OCPNODE-3722

- What I did
This patch introduces the 01-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (TODO) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ngopalak-redhat ngopalak-redhat changed the title OCPNODE-3722: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default [release-4.20] OCPNODE-3722: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default Nov 7, 2025
@ngopalak-redhat
Copy link
Contributor Author

/test bootstrap-unit

@ngopalak-redhat
Copy link
Contributor Author

/test all

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 7, 2025

@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue.

Details

In response to this:

Fixes: #OCPNODE-3722

- What I did
This patch introduces the 01-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (Set minimum version of 4.20 required to upgrade to 4.21 cincinnati-graph-data#8277) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ngopalak-redhat ngopalak-redhat marked this pull request as ready for review November 7, 2025 12:46
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 7, 2025
@ngopalak-redhat
Copy link
Contributor Author

@haircommander @sairameshv Can you please review?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 19, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ngopalak-redhat, sairameshv, umohnani8

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 19, 2025
@haircommander
Copy link
Member

/skip
/jira refresh

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 19, 2025

@haircommander: This pull request references OCPNODE-3722 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target only the "4.20.z" version, but multiple target versions were set.

Details

In response to this:

/skip
/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@haircommander
Copy link
Member

/retest-required

@haircommander
Copy link
Member

/jira refresh

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 19, 2025

@haircommander: This pull request references OCPNODE-3722 which is a valid jira issue.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@haircommander
Copy link
Member

/jira refresh

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 19, 2025

@haircommander: This pull request references OCPNODE-3722 which is a valid jira issue.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@haircommander
Copy link
Member

/retitle [release-4.20] OCPBUGS-65777: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default
/jira refresh

@openshift-ci openshift-ci bot changed the title [release-4.20] OCPNODE-3722: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default [release-4.20] OCPBUGS-65777: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default Nov 19, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 19, 2025

@haircommander: This pull request references OCPNODE-3722 which is a valid jira issue.

Details

In response to this:

/retitle [release-4.20] OCPBUGS-65777: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default
/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Nov 19, 2025
@openshift-ci-robot
Copy link
Contributor

@ngopalak-redhat: This pull request references Jira Issue OCPBUGS-65777, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Fixes: #OCPNODE-3722

- What I did
This patch introduces the 50-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (Set minimum version of 4.20 required to upgrade to 4.21 cincinnati-graph-data#8277) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@haircommander
Copy link
Member

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 19, 2025
@openshift-ci-robot
Copy link
Contributor

@haircommander: This pull request references Jira Issue OCPBUGS-65777, which is valid.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.z) matches configured target version for branch (4.20.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note type set to "Release Note Not Required"
  • dependent bug Jira Issue OCPBUGS-65778 is in the state Closed (Done), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-65778 targets the "4.21.0" version, which is one of the valid target versions: 4.21.0
  • bug has dependents
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ngopalak-redhat
Copy link
Contributor Author

/retest-required

1 similar comment
@isabella-janssen
Copy link
Member

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 19, 2025

@ngopalak-redhat: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/bootstrap-unit 8f0b0a8 link false /test bootstrap-unit

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ngopalak-redhat
Copy link
Contributor Author

/verified later @asahay19

@openshift-ci-robot openshift-ci-robot added verified-later verified Signifies that the PR passed pre-merge verification criteria labels Nov 20, 2025
@openshift-ci-robot
Copy link
Contributor

@ngopalak-redhat: This PR has been marked to be verified later by @asahay19.

Details

In response to this:

/verified later @asahay19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 0a8e123 into openshift:release-4.20 Nov 20, 2025
15 checks passed
@openshift-ci-robot
Copy link
Contributor

@ngopalak-redhat: Jira Issue OCPBUGS-65777: Some pull requests linked via external trackers have merged:

The following pull request, linked via external tracker, has not merged:

All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-65777 has not been moved to the MODIFIED state.

This PR is marked as verified-later. Jira issue(s) in the title of this PR will require post-merge verification. After testing, it must be manually moved to the VERIFIED state.

Details

In response to this:

Fixes: #OCPNODE-3722

- What I did
This patch introduces the 50-master-auto-sizing-disabled MachineConfig to OpenShift 4.20 clusters, setting the NODE_SIZING_ENABLED flag to false by default on master and worker nodes.
This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.

Summary of changes

  • Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.

  • Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (Set minimum version of 4.20 required to upgrade to 4.21 cincinnati-graph-data#8277) will enforce that this patch must be present before the upgrade path to 4.21 is started.

  • User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.

Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390

Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.

Rejected Alternatives
We explored several alternative solutions, but they were not feasible:

  • In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.

  • Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.

  • Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.

  • Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.

- How to verify it

  • Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.

  • Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.

  • User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).

  • Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.

- Description for the changelog

Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria verified-later

Projects

None yet

Development

Successfully merging this pull request may close these issues.