Skip to content

OCPBUGS-50603: Create dedicated identity for KAS bootstrap container#6827

Open
smrtrfszm wants to merge 2 commits intoopenshift:mainfrom
smrtrfszm:smrtrfszm/fix-kas-bootstrap
Open

OCPBUGS-50603: Create dedicated identity for KAS bootstrap container#6827
smrtrfszm wants to merge 2 commits intoopenshift:mainfrom
smrtrfszm:smrtrfszm/fix-kas-bootstrap

Conversation

@smrtrfszm
Copy link

@smrtrfszm smrtrfszm commented Sep 15, 2025

What this PR does / why we need it:

Currently it's not possible to change the feature gates for a CustomNoUpgrade feature set cluster, because there's a Validating Admission Policy that only allows the change if the username is system:hosted-cluster-config and the KAS bootstrap container tries to change it with system:admin. Also it uses the CreateOrUpdate function with an empty callback function, which causes any update to fail silently.

This PR creates a dedicated client certificate and kubeconfig for the KAS bootstrap container and adds this user to the Validating Admission Policy. The kubeconfig uses localhost as the server URL so the container can connect to the kube-apiserver within the same pod without going through the service, which is not yet available at that point. Secondly, it replaces the CreateOrUpdate with server-side apply, so updates will work as expected and every resource type won't need to be special cased with a callback function.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #https://issues.redhat.com/browse/OCPBUGS-50603

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 15, 2025
@openshift-ci-robot
Copy link

@smrtrfszm: This pull request references Jira Issue OCPBUGS-50603, which is invalid:

  • expected the bug to target either version "4.21." or "openshift-4.21.", but it targets "4.19" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

Currently it's not possible to change the feature gates for a CustomNoUpgrade feature set cluster, because there's a Validating Admission Policy that only allows the change if the username is system:hosted-cluster-config and the KAS bootstrap container tries to change it with system:admin. Also it uses the CreateOrUpdate function with an empty callback function, which causes any update to fail silently.

This PR creates a new kubeconfig that has the certs from the hcco-kubeconfig, but the KAS url is localhost, as the bootstrap container cannot access the KAS any other way. When to bootstrap script tries the apply the FeatureGate custom resource, it will use this kubeconfig.

Secondly, it replaces the CreateOrUpdate with server-side apply, so updates will work as expected.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #https://issues.redhat.com/browse/OCPBUGS-50603

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 15, 2025

Walkthrough

This change introduces a new kas-bootstrap-container identity with dedicated RBAC bindings, PKI signers, certificates, and kubeconfig configuration. It extends the control plane operator and kas-bootstrap components to provision the bootstrap container with authentication credentials and appropriate cluster permissions.

Changes

Cohort / File(s) Summary
PKI Infrastructure
control-plane-operator/.../manifests/pki.go, control-plane-operator/.../pki/ca.go, control-plane-operator/.../pki/kas.go
Added new exported helper functions for KAS bootstrap container signer and client certificate secrets: KASBootstrapContainerSigner(), KASBootstrapContainerClientCertSecret(), ReconcileKASBootstrapContainerSigner(), and ReconcileKASBootstrapContainerClientCertSecret() following existing PKI patterns.
KAS PKI Setup Integration
control-plane-operator/.../kas_pki_setup.go
Integrated new bootstrap container signer workflow into HostedControlPlane KAS client signers setup, reconciling the new signer and appending it to the total client CA bundle.
KAS Component Kubeconfig
control-plane-operator/.../v2/kas/component.go, control-plane-operator/.../v2/kas/kubeconfig.go
Added new manifest adapter adaptKASBootstrapContainerKubeconfigSecret() to generate kubeconfig for the kas-bootstrap-container using the new client certificate secret.
Asset Templates
control-plane-operator/.../v2/assets/kube-apiserver/deployment.yaml, control-plane-operator/.../v2/assets/kube-apiserver/kas-bootstrap-container-kubeconfig.yaml
Updated deployment template to add KUBECONFIG environment variable, volume mounts, and new ClusterRoleBinding manifest generation for kas-bootstrap-container; added new secret template for bootstrap container kubeconfig.
KAS Bootstrap Logic
kas-bootstrap/kas_boostrap.go, kas-bootstrap/kas_boostrap_test.go
Introduced Apply interface for server-side apply semantics, admin client initialization via KUBECONFIG\_ADMIN, and manifest application workflow that applies dedicated bootstrap container rolebinding with admin identity before other manifests. Refactored test suite to validate new Apply-based flow and manifest handling.
Bootstrap Container RBAC
kas-bootstrap/testdata/00_kas-bootstrap-container-rolebinding.yaml
Added ClusterRoleBinding manifest binding system:kas-bootstrap-container user to cluster-admin role.
Configuration Constants
support/config/types.go
Added new exported constant KASBootstrapContainerUser with value "kas-bootstrap-container".
Admission Policy Whitelist
control-plane-operator/.../admissionpolicies.go
Extended user whitelist to include KAS bootstrap container user in ValidatingAdmissionPolicy authorization checks.
Test Fixtures - ControlPlaneComponent
control-plane-operator/.../testdata/kube-apiserver/*/zz_fixture_TestControlPlaneComponents_kube_apiserver_controlplanecomponent.yaml
Updated test fixtures across all cloud variants (standard, TechPreviewNoUpgrade, AROSwift, GCP, IBMCloud) to include new kas-bootstrap-container-kubeconfig Secret resource in ControlPlaneComponent status.
Test Fixtures - Deployment
control-plane-operator/.../testdata/kube-apiserver/*/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml
Updated deployment test fixtures across all variants to reflect config-hash changes, new KUBECONFIG environment variable, volume mounts for bootstrap container kubeconfig, and generated ClusterRoleBinding manifest.
Test Fixtures - Kubeconfig Secret
control-plane-operator/.../testdata/kube-apiserver/*/zz_fixture_TestControlPlaneComponents_kas_bootstrap_container_kubeconfig_secret.yaml
Added new secret fixture files across all cloud variants defining the kas-bootstrap-container-kubeconfig Secret with base64-encoded kubeconfig data and HostedControlPlane ownership.
HostedControlPlane Controller Test
control-plane-operator/.../hostedcontrolplane_controller_test.go
Added initialization of kasBootstrapContainerCertSecret to test fixture objects, integrating the new bootstrap container certificate secret into reconciliation test setup.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: creating a dedicated identity (client certificate and kubeconfig) for the KAS bootstrap container, which aligns with the core objective and the substantial code changes throughout the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added do-not-merge/needs-area area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. and removed do-not-merge/needs-area labels Sep 15, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 15, 2025

Hi @smrtrfszm. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot requested review from csrwng and sjenning September 15, 2025 14:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-apiserver/deployment.yaml (1)

115-135: Add HCCO kubeconfig env + mount to kas-bootstrap container (critical)

kas-bootstrap reads KUBECONFIG_HCCO (kas-bootstrap/kas_boostrap.go) but the bootstrap container in control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-apiserver/deployment.yaml does not set it or mount the hcco-localhost-kubeconfig secret — add the env var, volumeMount and volume as shown below.

       - name: KUBECONFIG
         value: /var/secrets/localhost-kubeconfig/kubeconfig
+      - name: KUBECONFIG_HCCO
+        value: /var/secrets/hcco-localhost-kubeconfig/kubeconfig
...
       volumeMounts:
         - mountPath: /work
           name: bootstrap-manifests
         - mountPath: /var/secrets/localhost-kubeconfig
           name: localhost-kubeconfig
+        - mountPath: /var/secrets/hcco-localhost-kubeconfig
+          name: hcco-localhost-kubeconfig
...
       volumes:
         - emptyDir: {}
           name: bootstrap-manifests
         - name: localhost-kubeconfig
           secret:
             defaultMode: 0640
             secretName: localhost-kubeconfig
+        - name: hcco-localhost-kubeconfig
+          secret:
+            defaultMode: 0640
+            secretName: hcco-localhost-kubeconfig
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (1)

183-207: Wire HCCO kubeconfig into bootstrap for TPNU path too.

kas-bootstrap reads KUBECONFIG_HCCO (kas-bootstrap/kas_boostrap.go:59); this fixture only sets KUBECONFIG and mounts localhost-kubeconfig — add the KUBECONFIG_HCCO env var, mount, and volume for hcco-localhost-kubeconfig. File: control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml — env block around lines 187-190, volumeMounts around 200-206, volumes around 479-562.

Apply this diff:

       - name: KUBECONFIG
         value: /var/secrets/localhost-kubeconfig/kubeconfig
+      - name: KUBECONFIG_HCCO
+        value: /var/secrets/hcco-localhost-kubeconfig/kubeconfig
...
       volumeMounts:
         - mountPath: /work
           name: bootstrap-manifests
         - mountPath: /var/secrets/localhost-kubeconfig
           name: localhost-kubeconfig
+        - mountPath: /var/secrets/hcco-localhost-kubeconfig
+          name: hcco-localhost-kubeconfig
...
       volumes:
         - name: localhost-kubeconfig
           secret:
             defaultMode: 0640
             secretName: localhost-kubeconfig
+        - name: hcco-localhost-kubeconfig
+          secret:
+            defaultMode: 0640
+            secretName: hcco-localhost-kubeconfig
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (1)

183-207: Add HCCO kubeconfig mount + KUBECONFIG_HCCO env to bootstrap container (critical)

kas-bootstrap reads KUBECONFIG_HCCO (kas-bootstrap/kas_boostrap.go:59); this fixture's bootstrap container neither mounts the hcco-localhost-kubeconfig Secret nor sets KUBECONFIG_HCCO, so the HCCO client cannot be used and FeatureGate updates will fall back to the default client.

File: control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml Lines: 183-207 (also duplicate at 477-583)

Apply this diff to wire it:

       - name: KUBECONFIG
         value: /var/secrets/localhost-kubeconfig/kubeconfig
+      - name: KUBECONFIG_HCCO
+        value: /var/secrets/hcco-localhost-kubeconfig/kubeconfig
...
       volumeMounts:
         - mountPath: /work
           name: bootstrap-manifests
         - mountPath: /var/secrets/localhost-kubeconfig
           name: localhost-kubeconfig
+        - mountPath: /var/secrets/hcco-localhost-kubeconfig
+          name: hcco-localhost-kubeconfig
...
       volumes:
         - emptyDir: {}
           name: bootstrap-manifests
         - name: localhost-kubeconfig
           secret:
             defaultMode: 0640
             secretName: localhost-kubeconfig
+        - name: hcco-localhost-kubeconfig
+          secret:
+            defaultMode: 0640
+            secretName: hcco-localhost-kubeconfig
🧹 Nitpick comments (4)
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/zz_fixture_TestControlPlaneComponents_hcco_localhost_kubeconfig_secret.yaml (1)

1-20: Secret fixture contains fake material; mark as test-only to quiet scanners.

Gitleaks flags this. Add a comment/annotation indicating test fixture and fake data, or exclude testdata path in CI scanning rules.

Example annotation:

 metadata:
   creationTimestamp: null
+  annotations:
+    security.openshift.io/test-fixture: "true"
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_hcco_localhost_kubeconfig_secret.yaml (1)

1-20: Same: add test-only marker or scanner exclusion.

Mirror the annotation/exclusion here to avoid false-positive secret leaks in CI.

kas-bootstrap/kas_boostrap_test.go (1)

384-386: Consider simulating server-side apply more accurately in tests.

The fake Apply client uses Create instead of Patch with Apply. While this works for testing, consider using Patch with client.Apply to better simulate the actual server-side apply behavior once controller-runtime is upgraded.

 func (c *fakeApplyClient) Apply(ctx context.Context, obj *unstructured.Unstructured, opts ...client.PatchOption) error {
-	return c.client.Create(ctx, obj)
+	// TODO: Use Patch with client.Apply once controller-runtime is upgraded to >= 0.22
+	// For now, Create simulates the apply behavior in tests
+	return c.client.Create(ctx, obj)
 }
control-plane-operator/controllers/hostedcontrolplane/v2/kas/kubeconfig.go (1)

99-112: Consider refactoring to reduce code duplication.

The adaptHCCOLocalhostKubeconfigSecret function is nearly identical to adaptLocalhostKubeconfigSecret except for the client certificate secret used. Consider extracting a common helper function to reduce duplication.

+func adaptLocalhostKubeconfigSecretWithClientCert(cpContext component.WorkloadContext, secret *corev1.Secret, clientCertSecret *corev1.Secret) error {
+	apiServerPort := util.KASPodPort(cpContext.HCP)
+	localhostURL := fmt.Sprintf("https://localhost:%d", apiServerPort)
+	kubeconfig, err := GenerateKubeConfig(cpContext, clientCertSecret, localhostURL)
+	if err != nil {
+		return fmt.Errorf("failed to generate kubeconfig: %w", err)
+	}
+
+	if secret.Data == nil {
+		secret.Data = map[string][]byte{}
+	}
+	secret.Data[KubeconfigKey] = kubeconfig
+	return nil
+}
+
 func adaptLocalhostKubeconfigSecret(cpContext component.WorkloadContext, secret *corev1.Secret) error {
-	apiServerPort := util.KASPodPort(cpContext.HCP)
-	localhostURL := fmt.Sprintf("https://localhost:%d", apiServerPort)
-	kubeconfig, err := GenerateKubeConfig(cpContext, manifests.SystemAdminClientCertSecret(cpContext.HCP.Namespace), localhostURL)
-	if err != nil {
-		return fmt.Errorf("failed to generate kubeconfig: %w", err)
-	}
-
-	if secret.Data == nil {
-		secret.Data = map[string][]byte{}
-	}
-	secret.Data[KubeconfigKey] = kubeconfig
-	return nil
+	return adaptLocalhostKubeconfigSecretWithClientCert(cpContext, secret, manifests.SystemAdminClientCertSecret(cpContext.HCP.Namespace))
 }

 func adaptHCCOLocalhostKubeconfigSecret(cpContext component.WorkloadContext, secret *corev1.Secret) error {
-	apiServerPort := util.KASPodPort(cpContext.HCP)
-	localhostURL := fmt.Sprintf("https://localhost:%d", apiServerPort)
-	kubeconfig, err := GenerateKubeConfig(cpContext, manifests.HCCOClientCertSecret(cpContext.HCP.Namespace), localhostURL)
-	if err != nil {
-		return fmt.Errorf("failed to generate kubeconfig: %w", err)
-	}
-
-	if secret.Data == nil {
-		secret.Data = map[string][]byte{}
-	}
-	secret.Data[KubeconfigKey] = kubeconfig
-	return nil
+	return adaptLocalhostKubeconfigSecretWithClientCert(cpContext, secret, manifests.HCCOClientCertSecret(cpContext.HCP.Namespace))
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 53a3c61 and a51dbe9.

📒 Files selected for processing (12)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_hcco_localhost_kubeconfig_secret.yaml (1 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_apiserver_controlplanecomponent.yaml (1 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (1 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/zz_fixture_TestControlPlaneComponents_hcco_localhost_kubeconfig_secret.yaml (1 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/zz_fixture_TestControlPlaneComponents_kube_apiserver_controlplanecomponent.yaml (1 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (1 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-apiserver/deployment.yaml (1 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-apiserver/hcco-localhost-kubeconfig.yaml (1 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/component.go (1 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/kubeconfig.go (1 hunks)
  • kas-bootstrap/kas_boostrap.go (4 hunks)
  • kas-bootstrap/kas_boostrap_test.go (5 hunks)
🧰 Additional context used
🪛 Gitleaks (8.27.2)
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_hcco_localhost_kubeconfig_secret.yaml

[high] 2-4: Possible Kubernetes Secret detected, posing a risk of leaking credentials/tokens from your deployments

(kubernetes-secret-yaml)

🔇 Additional comments (14)
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (1)

357-371: Filename prefix change looks good; ensures RBAC precedes FeatureGate.

Using 98_ for the CRB and 99_ for FeatureGate provides deterministic ordering. LGTM.

control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (1)

357-371: Consistent 98_ CRB ordering for TechPreviewNoUpgrade.

Same rationale as non-TPNU path. Looks good.

control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/zz_fixture_TestControlPlaneComponents_kube_apiserver_controlplanecomponent.yaml (1)

50-52: New resource reference is correct and in expected position.

Adding hcco-localhost-kubeconfig after hcco-kubeconfig matches generation flow.

control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_apiserver_controlplanecomponent.yaml (1)

50-52: TPNU resource list updated consistently.

Entry for hcco-localhost-kubeconfig is correct.

control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-apiserver/deployment.yaml (1)

244-258: Good: CRB file renamed to 98_ to enforce ordering.

Matches test fixtures and intended RBAC-before-FeatureGate sequencing.

control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-apiserver/hcco-localhost-kubeconfig.yaml (1)

1-10: Asset stub looks correct for controller fill-in.

Empty data with label “kubeconfig: local” is appropriate; controller populates bytes.

kas-bootstrap/kas_boostrap.go (4)

59-67: LGTM! Proper HCCO kubeconfig setup for authentication.

The addition of HCCO client configuration properly addresses the authentication issue where the Validating Admission Policy only permits changes from username system:hosted-cluster-config.


201-257: Well-implemented client routing for FeatureGate resources.

The implementation correctly:

  1. Routes FeatureGate resources to the HCCO client for proper authentication
  2. Uses server-side apply with appropriate field ownership tracking
  3. Converts objects to unstructured format for apply operations

186-191: Add/link a tracker issue to upgrade controller-runtime to >= v0.22 and remove this temporary Apply workaround.

go.mod shows sigs.k8s.io/controller-runtime v0.20.4 with a replace -> v0.19.7, so the workaround in kas-bootstrap/kas_boostrap.go:186-191 remains necessary; add a tracking issue/PR (or link an existing one) with an ETA and remove this interface when upgraded.


59-59: Validate KUBECONFIG_HCCO at runtime and ensure deployments set it.

kas-bootstrap/kas_boostrap.go:59 reads os.Getenv("KUBECONFIG_HCCO") without validation — add an explicit empty-check (fail fast or provide a safe default) and add KUBECONFIG_HCCO to the relevant deployment/CI manifests (repo search found no other occurrences).

kas-bootstrap/kas_boostrap_test.go (2)

375-394: Good test implementation with fake Apply client.

The test properly simulates the dual-client setup with separate fake clients for the default and HCCO paths, which aligns well with the production implementation.


439-445: Excellent test coverage for client isolation.

The test correctly verifies that:

  1. FeatureGate is created via the HCCO client
  2. The same resource is not present on the primary client
    This ensures proper isolation between the two client paths.
control-plane-operator/controllers/hostedcontrolplane/v2/kas/component.go (1)

57-60: LGTM! Proper placement of HCCO localhost kubeconfig adapter.

The new adapter is correctly positioned after the local kubeconfig adapter and uses the appropriate adaptation function.

control-plane-operator/controllers/hostedcontrolplane/v2/kas/kubeconfig.go (1)

99-112: HCCO client cert secret is reconciled — no action required.

manifests.HCCOClientCertSecret is defined in control-plane-operator/controllers/hostedcontrolplane/manifests/pki.go and is reconciled in control-plane-operator/controllers/hostedcontrolplane/kas_pki_setup.go (reconcileSub(..., pki.ReconcileHCCOClientCertSecret)); GenerateKubeConfig will fail if the secret is missing at runtime.

@jeffnowicki
Copy link

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 15, 2025
@openshift-ci-robot
Copy link

@jeffnowicki: This pull request references Jira Issue OCPBUGS-50603, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (xiuwang+1@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jeffnowicki
Copy link

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 15, 2025
@smrtrfszm smrtrfszm force-pushed the smrtrfszm/fix-kas-bootstrap branch from a51dbe9 to 962cf62 Compare September 15, 2025 15:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
kas-bootstrap/kas_boostrap.go (2)

209-255: Deterministic apply order required (ensure 98_ applies before 99_feature-gate.yaml).*

filepath.Walk does not guarantee lexical order; if 99_feature-gate.yaml is visited before 98_hcco-rolebinding.yaml, the FeatureGate apply will fail and prevent the RBAC from ever being created, looping indefinitely. Collect, sort, then apply.

Apply this diff:

- // Walk the filesPath and apply the resources.
- err := filepath.Walk(filesPath, func(path string, info os.FileInfo, err error) error {
-   if err != nil {
-     return fmt.Errorf("error accessing path %s: %w", path, err)
-   }
-   if info.IsDir() {
-     logger.Info("Skipping dir", "path", path)
-     return nil
-   }
-   if filepath.Ext(path) != ".yaml" {
-     logger.Info("Skipping non-yaml file", "path", path)
-     return nil
-   }
-   logger.Info("Processing file", "path", path)
-   content, err := os.ReadFile(path)
-   if err != nil {
-     return fmt.Errorf("failed to read file %s: %w", path, err)
-   }
-   obj, _, err := configCodecs.UniversalDeserializer().Decode(content, nil, nil)
-   if err != nil {
-     return fmt.Errorf("failed to decode file %s: %w", path, err)
-   }
-   unstructuredObj, err := runtime.DefaultUnstructuredConverter.ToUnstructured(obj)
-   if err != nil {
-     return fmt.Errorf("failed to convert to unstructured: %w", err)
-   }
-   clientToUse := c
-   if _, ok := obj.(*configv1.FeatureGate); ok {
-     logger.Info("Using HCCO kubeconfig")
-     clientToUse = hccoClient
-   }
-   fieldOwner := filepath.Base(os.Args[0])
-   if err := clientToUse.Apply(ctx, &unstructured.Unstructured{Object: unstructuredObj}, client.FieldOwner(fieldOwner), client.ForceOwnership); err != nil {
-     return fmt.Errorf("failed to apply file %s: %w", path, err)
-   }
-   return nil
- })
- return err
+ // Collect candidate files, sort, then apply in lexical order.
+ var files []string
+ err := filepath.WalkDir(filesPath, func(path string, d fs.DirEntry, err error) error {
+   if err != nil {
+     return fmt.Errorf("error accessing path %s: %w", path, err)
+   }
+   if d.IsDir() {
+     return nil
+   }
+   ext := filepath.Ext(path)
+   if ext == ".yaml" || ext == ".yml" {
+     files = append(files, path)
+   }
+   return nil
+ })
+ if err != nil {
+   return err
+ }
+ sort.Strings(files)
+ for _, path := range files {
+   logger.Info("Processing file", "path", path)
+   content, err := os.ReadFile(path)
+   if err != nil {
+     return fmt.Errorf("failed to read file %s: %w", path, err)
+   }
+   obj, _, err := configCodecs.UniversalDeserializer().Decode(content, nil, nil)
+   if err != nil {
+     return fmt.Errorf("failed to decode file %s: %w", path, err)
+   }
+   unstructuredObj, err := runtime.DefaultUnstructuredConverter.ToUnstructured(obj)
+   if err != nil {
+     return fmt.Errorf("failed to convert to unstructured: %w", err)
+   }
+   clientToUse := c
+   if _, ok := obj.(*configv1.FeatureGate); ok {
+     logger.Info("Using HCCO kubeconfig")
+     clientToUse = hccoClient
+   }
+   if err := clientToUse.Apply(ctx, &unstructured.Unstructured{Object: unstructuredObj}, client.FieldOwner(fieldManagerName), client.ForceOwnership); err != nil {
+     return fmt.Errorf("failed to apply file %s: %w", path, err)
+   }
+ }
+ return nil

Add missing imports:

// imports (add)
import (
  "io/fs"
  "sort"
)

151-158: Guard against empty renderedFeatureGate.Status.FeatureGates.

Indexing position 0 panics when the rendered manifest lacks status entries (several fixtures show status: null). Guard and short‑circuit.

Apply this diff:

- desiredFeatureGates := renderedFeatureGate.Status.FeatureGates
- currentVersion := renderedFeatureGate.Status.FeatureGates[0].Version
+ desiredFeatureGates := renderedFeatureGate.Status.FeatureGates
+ if len(renderedFeatureGate.Status.FeatureGates) == 0 {
+   logger.Info("Rendered FeatureGate has no status.featureGates; skipping status reconcile")
+   return nil
+ }
+ currentVersion := renderedFeatureGate.Status.FeatureGates[0].Version
🧹 Nitpick comments (9)
kas-bootstrap/kas_boostrap.go (3)

248-251: Stabilize SSA field manager name.

Using os.Args[0] can vary across invocations; prefer a constant to keep ownership stable.

Apply this diff:

- fieldOwner := filepath.Base(os.Args[0])
- if err := clientToUse.Apply(ctx, &unstructured.Unstructured{Object: unstructuredObj}, client.FieldOwner(fieldOwner), client.ForceOwnership); err != nil {
+ if err := clientToUse.Apply(ctx, &unstructured.Unstructured{Object: unstructuredObj}, client.FieldOwner(fieldManagerName), client.ForceOwnership); err != nil {

And add near the top-level declarations:

const fieldManagerName = "kas-bootstrap"

232-235: Multi‑document YAML not supported (only first doc decoded).

If any manifest contains multiple docs (---), only the first is applied. Consider streaming decode via util/yaml.NewYAMLOrJSONDecoder and applying each doc.

Would you like a follow‑up patch to stream-decode multi-doc manifests?


186-189: Temporary SSA workaround: add TODO for upgrade tracking.

Good stopgap. Add a TODO with an issue link to remove Apply shim once controller-runtime ≥ 0.22 is adopted.

kas-bootstrap/kas_boostrap_test.go (3)

384-387: Test Apply uses Create; acceptable here but note semantic differences.

Create is not idempotent like SSA; fine for single-pass tests. Add a brief comment to avoid future confusion.


442-446: Assert NotFound explicitly for clarity.

Strengthen the expectation to check apierrors.IsNotFound.

Apply this diff:

- err = c.Get(t.Context(), client.ObjectKey{Name: "cluster"}, featureGate2)
- g.Expect(err).To(HaveOccurred())
+ err = c.Get(t.Context(), client.ObjectKey{Name: "cluster"}, featureGate2)
+ g.Expect(apierrors.IsNotFound(err)).To(BeTrue())

Add import:

import apierrors "k8s.io/apimachinery/pkg/api/errors"

396-453: Add a unit test to enforce deterministic apply order.

Introduce a test that simulates unsorted enumeration (FeatureGate before CRB) and proves ordering logic applies 98_* before 99_*, preventing a permanent loop.

I can add a test that collects files, shuffles order, and asserts success only when sorted apply is used. Want me to push that?

control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-apiserver/deployment.yaml (1)

137-139: Harden secret mount with readOnly: true.

Mount the kubeconfig secret read-only.

Apply this diff:

 -        - mountPath: /var/secrets/hcco-localhost-kubeconfig
 -          name: hcco-localhost-kubeconfig
+        - mountPath: /var/secrets/hcco-localhost-kubeconfig
+          name: hcco-localhost-kubeconfig
+          readOnly: true
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (1)

207-209: Align fixture with readOnly mount.

Update to reflect readOnly: true for the secret mount.

Apply this diff:

-        - mountPath: /var/secrets/hcco-localhost-kubeconfig
-          name: hcco-localhost-kubeconfig
+        - mountPath: /var/secrets/hcco-localhost-kubeconfig
+          name: hcco-localhost-kubeconfig
+          readOnly: true
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (1)

207-209: Align TechPreview fixture with readOnly mount.

Apply this diff:

-        - mountPath: /var/secrets/hcco-localhost-kubeconfig
-          name: hcco-localhost-kubeconfig
+        - mountPath: /var/secrets/hcco-localhost-kubeconfig
+          name: hcco-localhost-kubeconfig
+          readOnly: true
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between a51dbe9 and 962cf62.

📒 Files selected for processing (5)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (5 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (5 hunks)
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-apiserver/deployment.yaml (4 hunks)
  • kas-bootstrap/kas_boostrap.go (4 hunks)
  • kas-bootstrap/kas_boostrap_test.go (5 hunks)
🔇 Additional comments (9)
control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-apiserver/deployment.yaml (3)

123-125: LGTM: KUBECONFIG_HCCO wired to secret path.

Please confirm the hcco-localhost-kubeconfig Secret data points to https://localhost:6443 and uses the HCCO client certs.


248-250: Renamed to 98_hcco-rolebinding.yaml — good.

Couple this with deterministic sorting in kas-bootstrap to guarantee RBAC precedes FeatureGate.


340-343: LGTM: hcco-localhost-kubeconfig volume added.

control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (3)

190-192: Fixture updated with KUBECONFIG_HCCO — good.


361-376: Fixture rename to 98_hcco-rolebinding.yaml — good.


487-491: Fixture volume for hcco-localhost-kubeconfig — good.

control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_kube_apiserver_deployment.yaml (3)

190-192: TechPreview fixture updated with KUBECONFIG_HCCO — good.


361-376: TechPreview fixture rename to 98_hcco-rolebinding.yaml — good.


490-493: TechPreview fixture volume for hcco-localhost-kubeconfig — good.

Comment on lines 59 to 66
hccoCFG, err := clientcmd.BuildConfigFromFlags("", os.Getenv("KUBECONFIG_HCCO"))
if err != nil {
return fmt.Errorf("failed to get HCCO config: %w", err)
}
hccoClient, err := client.New(hccoCFG, client.Options{Scheme: configScheme})
if err != nil {
return fmt.Errorf("failed to create client with HCCO kubeconfig: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Prevent unintended fallback to KUBECONFIG; require KUBECONFIG_HCCO explicitly.

If KUBECONFIG_HCCO is unset/empty, BuildConfigFromFlags may load the default KUBECONFIG, causing the HCCO client to use system:admin and fail Admission on FeatureGate forever. Enforce presence and readability of KUBECONFIG_HCCO.

Apply this diff:

- hccoCFG, err := clientcmd.BuildConfigFromFlags("", os.Getenv("KUBECONFIG_HCCO"))
- if err != nil {
- 	return fmt.Errorf("failed to get HCCO config: %w", err)
- }
+ hccoPath, ok := os.LookupEnv("KUBECONFIG_HCCO")
+ if !ok || strings.TrimSpace(hccoPath) == "" {
+ 	return fmt.Errorf("KUBECONFIG_HCCO not set; required to apply FeatureGate as system:hosted-cluster-config")
+ }
+ if _, err := os.Stat(hccoPath); err != nil {
+ 	return fmt.Errorf("HCCO kubeconfig path %q not accessible: %w", hccoPath, err)
+ }
+ hccoCFG, err := clientcmd.BuildConfigFromFlags("", hccoPath)
+ if err != nil {
+ 	return fmt.Errorf("failed to get HCCO config from %q: %w", hccoPath, err)
+ }

Add missing imports:

// imports (add)
import (
  "strings"
)
🤖 Prompt for AI Agents
In kas-bootstrap/kas_boostrap.go around lines 59-66, enforce that
KUBECONFIG_HCCO is explicitly provided and readable (do not allow
clientcmd.BuildConfigFromFlags to fall back to default KUBECONFIG); check
os.Getenv("KUBECONFIG_HCCO") for non-empty (using strings.TrimSpace) and verify
the path is accessible (os.Stat or ioutil.ReadFile) before calling
clientcmd.BuildConfigFromFlags with that path, returning a clear error if
missing/unreadable, and add the requested import for "strings".

@cwbotbot
Copy link

cwbotbot commented Sep 15, 2025

Test Results

e2e-aws

e2e-aks

@TwoDCube
Copy link
Member

/ok-to-test

@TwoDCube
Copy link
Member

/retest

@openshift-ci-robot
Copy link

@smrtrfszm: This pull request references Jira Issue OCPBUGS-50603, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (xiuwang+1@redhat.com), skipping review request.

Details

In response to this:

What this PR does / why we need it:

Currently it's not possible to change the feature gates for a CustomNoUpgrade feature set cluster, because there's a Validating Admission Policy that only allows the change if the username is system:hosted-cluster-config and the KAS bootstrap container tries to change it with system:admin. Also it uses the CreateOrUpdate function with an empty callback function, which causes any update to fail silently.

This PR creates a new kubeconfig that has the certs from the hcco-kubeconfig, but the KAS url is localhost, as the bootstrap container cannot access the KAS any other way, while the cluster is deploying. When to bootstrap script tries the apply the FeatureGate custom resource, it will use this kubeconfig.

Secondly, it replaces the CreateOrUpdate with server-side apply, so updates will work as expected.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #https://issues.redhat.com/browse/OCPBUGS-50603

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

env:
- name: KUBECONFIG
value: /var/secrets/localhost-kubeconfig/kubeconfig
- name: KUBECONFIG_HCCO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason why we don't always use HCCO identity for bootstrapping? I could very easily see us hitting this problem for other resources in the future and it seems wise for us to ensure that bootstrapping is not blocked by the validating admission policy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this same program gives the permissions to the hcco identity. We would have to apply the 98_hcco-rolebinding.yaml somewhere else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have Red Hat engineers had a chance to look at this PR and provide their opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to consider the rolebinding for the hcco a special thing (and even exclude it from the list of manifests). So invert what this PR does... use the old kubeconfig only for the hcco rolebinding (and only create it if not needed). The bootstrap program could just create/reconcile it inline. Once the HCCO has the power to do anything, then apply all bootstrap manifests using the hcco kubeconfig. That way future bootstrap manifests that we may disallow via VAP can still be applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csrwng I am a bit confused that you approved the PR yet would prefer things to be done differently. Can you please explain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per @sjenning, let's create a separate identity that is allowed to make the same updates as the HCCO. That way we can keep track of who makes the updates.

Copy link
Contributor

@csrwng csrwng Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain?

I removed the approved label b/c I thought of a better way to handle this ^^.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR with the new identity.

@csrwng
Copy link
Contributor

csrwng commented Oct 23, 2025

/retest-required

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 23, 2025
@csrwng csrwng removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 23, 2025
@rtheis
Copy link
Contributor

rtheis commented Oct 23, 2025

/ok-to-test

Add a dedicated client certificate and kubeconfig for the KAS bootstrap
container. The kubeconfig uses localhost as the server URL so the
bootstrap container can connect to the kube-apiserver within the same
pod without going through the service.

This is needed because the bootstrap container must apply manifests
before the kube-apiserver is fully ready, and the service endpoint is
not yet available at that point.
@smrtrfszm smrtrfszm force-pushed the smrtrfszm/fix-kas-bootstrap branch from 962cf62 to 62d8fca Compare February 14, 2026 02:53
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: smrtrfszm
Once this PR has been reviewed and has the lgtm label, please assign devguyio for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release label Feb 14, 2026
@openshift-ci-robot openshift-ci-robot added jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. and removed jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Feb 14, 2026
@openshift-ci-robot
Copy link

@smrtrfszm: This pull request references Jira Issue OCPBUGS-50603, which is invalid:

  • expected the bug to target either version "4.22." or "openshift-4.22.", but it targets "4.21" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

What this PR does / why we need it:

Currently it's not possible to change the feature gates for a CustomNoUpgrade feature set cluster, because there's a Validating Admission Policy that only allows the change if the username is system:hosted-cluster-config and the KAS bootstrap container tries to change it with system:admin. Also it uses the CreateOrUpdate function with an empty callback function, which causes any update to fail silently.

This PR creates a dedicated client certificate and kubeconfig for the KAS bootstrap container. The kubeconfig uses localhost as the server URL so the container can connect to the kube-apiserver within the same pod without going through the service, which is not yet available at that point. Secondly, it replaces the CreateOrUpdate with server-side apply, so updates will work as expected and every resource type won't need to be special cased with a callback function.

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #https://issues.redhat.com/browse/OCPBUGS-50603

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@smrtrfszm smrtrfszm changed the title OCPBUGS-50603: Use HCCO kubeconfig to apply the FeatureGate CR OCPBUGS-50603: Create dedicated identity for KAS bootstrap container Feb 14, 2026
@TwoDCube
Copy link
Member

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Feb 14, 2026
@openshift-ci-robot
Copy link

@TwoDCube: This pull request references Jira Issue OCPBUGS-50603, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @xiuwang

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Feb 14, 2026
@openshift-ci openshift-ci bot requested a review from xiuwang February 14, 2026 05:59
Replace CreateOrUpdate with server-side apply for applying bootstrap
manifests. CreateOrUpdate cannot apply resources generically as all
resource types need to be special cased with a callback function.

Introduce a two-client architecture: an admin (localhost) client applies
the kas-bootstrap-container ClusterRoleBinding first to grant
cluster-admin, then a second client using the kas-bootstrap-container
identity applies the remaining manifests.

The FeatureGate CR has a ValidatingAdmissionPolicy that restricts spec
changes to the system:hosted-cluster-config user, so
reconcileFeatureGate
now uses the kas-bootstrap-container client and this user is also added
to the ValidatingAdmissionPolicy so it can apply the FeatureGate spec.
@smrtrfszm smrtrfszm force-pushed the smrtrfszm/fix-kas-bootstrap branch from 62d8fca to 40fb921 Compare February 14, 2026 11:16
@TwoDCube
Copy link
Member

/ok-to-test

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 14, 2026

@smrtrfszm: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 962cf62 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aks-4-21 962cf62 link true /test e2e-aks-4-21

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@TwoDCube
Copy link
Member

/retest

@TwoDCube
Copy link
Member

/test all

@TwoDCube
Copy link
Member

/test e2e-aws

@TwoDCube
Copy link
Member

/test e2e-aks
/test e2e-aws-upgrade-hypershift-operator
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@TwoDCube
Copy link
Member

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 16, 2026

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@kas-bootstrap/kas_boostrap_test.go`:
- Around line 376-391: Update the workaround comment above
fakeApplyClient/newFakeApplyClient to accurately reflect that the fake client's
rejection of ApplyPatchType still exists in controller-runtime v0.22.4; change
the version note from "once controller-runtime is upgraded to >= 0.22" to
reference the correct future version (e.g., v0.23) or a specific patch that
resolves the fake client's "apply patches are not supported in the fake client"
error, and optionally include that error message to clarify why the Create-based
Apply shim is required until that version.

Comment on lines +376 to +391
// fakeApplyClient wraps a fake client.Client and implements the Apply interface
// by using Create instead of Patch with client.Apply, because the current
// controller-runtime version has a bug that prevents server-side apply with fake.Client.
// Once controller-runtime is upgraded to >= 0.22, this can be replaced.
type fakeApplyClient struct {
client.Client
}

func newFakeApplyClient() *fakeApplyClient {
c := fake.NewClientBuilder().WithScheme(configScheme).Build()
return &fakeApplyClient{Client: c}
}

func (c *fakeApplyClient) Apply(ctx context.Context, obj *unstructured.Unstructured, opts ...client.PatchOption) error {
return c.Client.Create(ctx, obj)
}
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the actual controller-runtime version in go.mod
echo "=== Controller-runtime version ==="
rg 'sigs.k8s.io/controller-runtime' go.mod -A 1

echo ""
echo "=== Checking for fake.Client Apply method ==="
# Look for Apply-related methods in fake client
fd 'fake' -t f --max-depth 3 | head -20

Repository: openshift/hypershift

Length of output: 412


🏁 Script executed:

#!/bin/bash
# Check support/testutil/fake.go to see if there's custom fake client code
echo "=== support/testutil/fake.go ==="
wc -l support/testutil/fake.go
head -50 support/testutil/fake.go

echo ""
echo "=== kas_boostrap_test.go context around lines 376-391 ==="
sed -n '370,400p' kas-bootstrap/kas_boostrap_test.go

echo ""
echo "=== Search for Apply method in test file ==="
rg 'func.*Apply' kas-bootstrap/kas_boostrap_test.go -A 3

Repository: openshift/hypershift

Length of output: 3951


🏁 Script executed:

#!/bin/bash
# Search for any comments or issues about fake.Client Apply bug
echo "=== Search for fake.Client Apply in codebase ==="
rg 'fake\.Client.*Apply|Apply.*fake' --type go -B 2 -A 2 | head -40

echo ""
echo "=== Check if there are any TODO/FIXME comments about this ==="
rg 'TODO|FIXME' kas-bootstrap/kas_boostrap_test.go

echo ""
echo "=== Look for when this file was last modified ==="
git log --oneline --all -- kas-bootstrap/kas_boostrap_test.go | head -10

echo ""
echo "=== Check git blame for the fakeApplyClient comment ==="
git blame kas-bootstrap/kas_boostrap_test.go | sed -n '376,379p'

Repository: openshift/hypershift

Length of output: 5518


🏁 Script executed:

#!/bin/bash
# Search for any issues or discussions about fake.Client.Apply in controller-runtime
echo "=== Check go.mod for any comments about the replace directive ==="
sed -n '1,50p' go.mod | grep -A 5 -B 5 'controller-runtime'

echo ""
echo "=== Search for any other references to this bug in comments ==="
rg 'fake.*Client|controller-runtime.*bug' --type go -i | head -20

echo ""
echo "=== Check the commit message for more context ==="
git show --no-patch 40fb92107 | head -30

Repository: openshift/hypershift

Length of output: 3858


🏁 Script executed:

#!/bin/bash
# Look deeper into the fake client implementation to understand the Apply limitations
echo "=== Check fake.Client.Apply implementation in vendor ==="
fd 'fake' vendor/sigs.k8s.io/controller-runtime -type f -name '*.go' | head -10

echo ""
echo "=== Search for Apply method in fake client ==="
rg 'func.*Apply' vendor/sigs.k8s.io/controller-runtime/pkg/client/fake/ -A 2

echo ""
echo "=== Check doc.go for known issues ==="
cat vendor/sigs.k8s.io/controller-runtime/pkg/client/fake/doc.go

Repository: openshift/hypershift

Length of output: 2019


🏁 Script executed:

#!/bin/bash
# List files in fake package
echo "=== Files in fake package ==="
find vendor/sigs.k8s.io/controller-runtime/pkg/client/fake -type f -name '*.go'

echo ""
echo "=== Check client.go for Apply ==="
grep -n 'Apply' vendor/sigs.k8s.io/controller-runtime/pkg/client/fake/client.go | head -20

echo ""
echo "=== Check if there's a Patch method which Apply depends on ==="
grep -n 'func.*Patch' vendor/sigs.k8s.io/controller-runtime/pkg/client/fake/client.go -A 5 | head -30

Repository: openshift/hypershift

Length of output: 2062


🏁 Script executed:

#!/bin/bash
# Check if Apply method exists in the fake.Client interface at all
echo "=== Search for Apply in entire fake client file ==="
wc -l vendor/sigs.k8s.io/controller-runtime/pkg/client/fake/client.go

echo ""
echo "=== Check lines around 1000-1050 to see Apply implementation ==="
sed -n '1000,1050p' vendor/sigs.k8s.io/controller-runtime/pkg/client/fake/client.go

echo ""
echo "=== Check if Apply is mentioned anywhere in fake client ==="
grep -i 'apply' vendor/sigs.k8s.io/controller-runtime/pkg/client/fake/client.go | grep -i 'func.*apply' -i

echo ""
echo "=== Look at client.Client interface definition in controller-runtime ==="
find vendor/sigs.k8s.io/controller-runtime/pkg/client -name '*.go' -exec grep -l 'type Client interface' {} \;
</parameter>

Repository: openshift/hypershift

Length of output: 2048


🏁 Script executed:

#!/bin/bash
# Check the interfaces.go for Apply method
echo "=== Search for Apply in client interfaces ==="
grep -n 'Apply' vendor/sigs.k8s.io/controller-runtime/pkg/client/interfaces.go | head -20

echo ""
echo "=== Check the specific Apply method signature ==="
sed -n '1,150p' vendor/sigs.k8s.io/controller-runtime/pkg/client/interfaces.go | grep -A 10 'Apply'

echo ""
echo "=== Now check the error message more carefully ==="
grep -B 5 -A 5 'apply patches are not supported' vendor/sigs.k8s.io/controller-runtime/pkg/client/fake/client.go

Repository: openshift/hypershift

Length of output: 602


🏁 Script executed:

#!/bin/bash
# Check GitHub issue to understand the status
echo "=== Look for any mentions of the GitHub issue in code ==="
rg 'kubernetes/kubernetes/issues/115598' --type go

echo ""
echo "=== Check what version of controller-runtime this error message was added in ==="
git log --all --grep='apply patches' -- vendor/ 2>/dev/null | head -20 || echo "Git log not available for vendor/"

echo ""
echo "=== Check if there's a newer v0.22.x version that might have fixed this ==="
rg 'v0\.22' go.mod

echo ""
echo "=== Check the actual Patch method signature to understand the workaround better ==="
grep -n 'func (c \*fakeClient) Patch' vendor/sigs.k8s.io/controller-runtime/pkg/client/fake/client.go -A 20

Repository: openshift/hypershift

Length of output: 1434


🏁 Script executed:

#!/bin/bash
# Final verification: Check the actual Apply method implementation in interfaces.go
echo "=== Find Apply method in interfaces ==="
grep -n 'Apply' vendor/sigs.k8s.io/controller-runtime/pkg/client/interfaces.go

echo ""
echo "=== Look at the Client interface structure ==="
sed -n '1,200p' vendor/sigs.k8s.io/controller-runtime/pkg/client/interfaces.go | tail -50

Repository: openshift/hypershift

Length of output: 2280


Update workaround comment — bug still exists in v0.22.4, reference needs correction.

The comment states the fakeApplyClient workaround can be removed "once controller-runtime is upgraded to >= 0.22", but the project already uses v0.22.4. The fake client in v0.22.4 still explicitly rejects ApplyPatchType with the error message: "apply patches are not supported in the fake client." Update the comment to reference the actual version where this limitation was or will be resolved (e.g., "v0.23" or a specific patch).

🤖 Prompt for AI Agents
In `@kas-bootstrap/kas_boostrap_test.go` around lines 376 - 391, Update the
workaround comment above fakeApplyClient/newFakeApplyClient to accurately
reflect that the fake client's rejection of ApplyPatchType still exists in
controller-runtime v0.22.4; change the version note from "once
controller-runtime is upgraded to >= 0.22" to reference the correct future
version (e.g., v0.23) or a specific patch that resolves the fake client's "apply
patches are not supported in the fake client" error, and optionally include that
error message to clarify why the Create-based Apply shim is required until that
version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai have you considered that the controller-runtime is actually being replaced with a different version in go.mod?

replace sigs.k8s.io/controller-runtime => sigs.k8s.io/controller-runtime v0.19.7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the replace directive in go.mod
echo "=== Checking for replace directive for controller-runtime ==="
grep -n 'replace.*controller-runtime' go.mod

echo ""
echo "=== Full context of controller-runtime in go.mod ==="
grep -B 2 -A 2 'controller-runtime' go.mod

Length of output: 681


@TwoDCube You're absolutely right, and I apologize for missing that. I should have checked for replace directives in go.mod.

The replace directive at line 323:

replace sigs.k8s.io/controller-runtime => sigs.k8s.io/controller-runtime v0.19.7

overrides the v0.22.4 requirement, meaning the actual version in use is v0.19.7. The comment in the code stating "once controller-runtime is upgraded to >= 0.22" is therefore accurate and appropriate.

My original review comment suggesting the comment needed updating was incorrect. The workaround is indeed still necessary with v0.19.7, and the comment correctly indicates the version target for when it can be removed. You can disregard my earlier suggestion.

Thank you for catching this!


🧬 Code Graph Analysis Results

[kas-bootstrap/kas_boostrap.go:211-213]
Apply interface {
Apply(ctx context.Context, obj *unstructured.Unstructured, opts ...client.PatchOption) error
}


📝 Learnings were identified and not saved (knowledge base disabled). Enable

@TwoDCube
Copy link
Member

As discussed, the new identity is definitely the right direction for new releases. However, this would be a breaking change for older releases so this can't be backported as-is. The backports should impersonate the HCCO to fix the issue instead of requiring everyone who disabled pki reconciliation to create these new identities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants