Skip to content

Conversation

@eliorerz
Copy link
Contributor

@eliorerz eliorerz commented Feb 18, 2025

This is a partial implementation of #2550 PR, the final code will (probably) extend ClusterDeployment CR and will add support with MachinePools.

This PR adds support for provisioning OpenShift clusters on Nutanix using the OpenShift Installer's IPI installation method within Hive.

Key changes include:

  • Integration with the OpenShift Installer IPI workflow for Nutanix.
  • Implementation of necessary controllers, validations, and configuration options.
  • Automatically set install-config nutanix platform credentials from secret nutanix-creds (see pasteInProviderCredentials method)

Implementation:

  • ClusterDeployment
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  name: cluster-nutanix
  namespace: hive
spec:
  clusterName: cluster-nutanix
  baseDomain: example.com
  platform:
    nutanix:
      credentialsSecretRef:
        name: nutanix-creds
      prismCentral:
        address: cluster-nutanix.prism-central.nutanix.com
        port: 9440
  provisioning:
    installConfigSecretRef:
      name: cluster-nutanix-install-config
    imageSetRef:
      name: cluster-nutanix-image-set
  pullSecretRef:
    name: pull-secret
****

Secrets

  • nutanix-creds.yaml
apiVersion: v1
data:
  password: <password>
  username: <username>
kind: Secret
metadata:
  name: nutanix-creds
  namespace: hive
  • install-config.yaml
apiVersion: v1
baseDomain: example.com
compute:
- name: worker
controlPlane:
  name: master
metadata:
  name: cluster-nutanix
platform:
  nutanix:
    apiVIPs:
      - 10.0.0.123
    ingressVIPs:
      - 10.0.0.124
    prismCentral:
      endpoint:
        address: cluster-nutanix.prism-central.nutanix.com
        port: 9440
    prismElements: 
      - endpoint:
          address: cluster-nutanix.prism-element.nutanix.com
          port: 9440
        uuid: 0005de05-75a3-dacb-ba00-2c5da2ac4c1a
        name: "NAME"
    subnetUUIDs:
      -  0005de05-75a3-dacb-ba00-123456789012
    failureDomains:
      - name: "LD Name"
        subnetUUIDs:
          -  0005de05-75a3-dacb-ba00-123456789012
        prismElements: 
          - endpoint:
              address: cluster-nutanix.prism-element.nutanix.com
              port: 9440
            uuid: 0005de05-75a3-dacb-ba00-2c5da2ac4c1a
            name: "NAME"

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 18, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 18, 2025

@eliorerz: This pull request references HIVE-2777 which is a valid jira issue.

Details

In response to this:

/cc @eliorerz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 18, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 18, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 18, 2025

@eliorerz: GitHub didn't allow me to request PR reviews from the following users: eliorerz.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @eliorerz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@eliorerz eliorerz force-pushed the HIVE-2777-Implement-Hive-Nutanix-Provisioning branch 3 times, most recently from dc89df3 to c68f669 Compare February 19, 2025 12:22
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 19, 2025

@eliorerz: This pull request references HIVE-2777 which is a valid jira issue.

Details

In response to this:

This PR adds support for provisioning OpenShift clusters on Nutanix using the OpenShift Installer's IPI installation method within Hive.

Key changes include:

  • Integration with the OpenShift Installer IPI workflow for Nutanix.
  • Implementation of necessary controllers, validations, and configuration options.

Implementation:

  • ClusterDeployment
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
 name: cluster-nutanix
 namespace: hive
spec:
 clusterName: cluster-nutanix
 baseDomain: example.com
 platform:
   nutanix:
     credentialsSecretRef:
       name: nutanix-creds
     prismCentral:
       address: cluster-nutanix.prism-central.nutanix.com
       port: 9440
 provisioning:
   installConfigSecretRef:
     name: cluster-nutanix-install-config
   imageSetRef:
     name: cluster-nutanix-image-set
 pullSecretRef:
   name: pull-secret
****

Secrets

  • nutanix-creds.yaml
apiVersion: v1
data:
 password: <password>
 username: <username>
kind: Secret
metadata:
 name: nutanix-creds
 namespace: hive
  • install-config.yaml
apiVersion: v1
baseDomain: example.com
compute:
- name: worker
controlPlane:
 name: master
metadata:
 name: cluster-nutanix
platform:
 nutanix:
   apiVIPs:
     - 10.0.0.123
   ingressVIPs:
     - 10.0.0.124
   prismCentral:
     endpoint:
       address: cluster-nutanix.prism-central.nutanix.com
       port: 9440
   prismElements: 
     - endpoint:
         address: cluster-nutanix.prism-element.nutanix.com
         port: 9440
       uuid: 0005de05-75a3-dacb-ba00-2c5da2ac4c1a
       name: "NAME"
   subnetUUIDs:
     -  0005de05-75a3-dacb-ba00-123456789012
   failureDomains:
     - name: "LD Name"
       subnetUUIDs:
         -  0005de05-75a3-dacb-ba00-123456789012
       prismElements: 
         - endpoint:
             address: cluster-nutanix.prism-element.nutanix.com
             port: 9440
           uuid: 0005de05-75a3-dacb-ba00-2c5da2ac4c1a
           name: "NAME"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@eliorerz eliorerz marked this pull request as ready for review February 19, 2025 12:47
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2025
@eliorerz
Copy link
Contributor Author

/cc @2uasimojo

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 19, 2025

@eliorerz: This pull request references HIVE-2777 which is a valid jira issue.

Details

In response to this:

This PR adds support for provisioning OpenShift clusters on Nutanix using the OpenShift Installer's IPI installation method within Hive.

Key changes include:

  • Integration with the OpenShift Installer IPI workflow for Nutanix.
  • Implementation of necessary controllers, validations, and configuration options.
  • Automatically set install-config nutanix platform credentials from secret nutanix-creds (see pasteInProviderCredentials method)

Implementation:

  • ClusterDeployment
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
 name: cluster-nutanix
 namespace: hive
spec:
 clusterName: cluster-nutanix
 baseDomain: example.com
 platform:
   nutanix:
     credentialsSecretRef:
       name: nutanix-creds
     prismCentral:
       address: cluster-nutanix.prism-central.nutanix.com
       port: 9440
 provisioning:
   installConfigSecretRef:
     name: cluster-nutanix-install-config
   imageSetRef:
     name: cluster-nutanix-image-set
 pullSecretRef:
   name: pull-secret
****

Secrets

  • nutanix-creds.yaml
apiVersion: v1
data:
 password: <password>
 username: <username>
kind: Secret
metadata:
 name: nutanix-creds
 namespace: hive
  • install-config.yaml
apiVersion: v1
baseDomain: example.com
compute:
- name: worker
controlPlane:
 name: master
metadata:
 name: cluster-nutanix
platform:
 nutanix:
   apiVIPs:
     - 10.0.0.123
   ingressVIPs:
     - 10.0.0.124
   prismCentral:
     endpoint:
       address: cluster-nutanix.prism-central.nutanix.com
       port: 9440
   prismElements: 
     - endpoint:
         address: cluster-nutanix.prism-element.nutanix.com
         port: 9440
       uuid: 0005de05-75a3-dacb-ba00-2c5da2ac4c1a
       name: "NAME"
   subnetUUIDs:
     -  0005de05-75a3-dacb-ba00-123456789012
   failureDomains:
     - name: "LD Name"
       subnetUUIDs:
         -  0005de05-75a3-dacb-ba00-123456789012
       prismElements: 
         - endpoint:
             address: cluster-nutanix.prism-element.nutanix.com
             port: 9440
           uuid: 0005de05-75a3-dacb-ba00-2c5da2ac4c1a
           name: "NAME"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 19, 2025

@eliorerz: This pull request references HIVE-2777 which is a valid jira issue.

Details

In response to this:

This is a partial implementation of #2550 PR, the final code will (probably) extend ClusterDeployment CR and will add support with MachinePools.

This PR adds support for provisioning OpenShift clusters on Nutanix using the OpenShift Installer's IPI installation method within Hive.

Key changes include:

  • Integration with the OpenShift Installer IPI workflow for Nutanix.
  • Implementation of necessary controllers, validations, and configuration options.
  • Automatically set install-config nutanix platform credentials from secret nutanix-creds (see pasteInProviderCredentials method)

Implementation:

  • ClusterDeployment
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
 name: cluster-nutanix
 namespace: hive
spec:
 clusterName: cluster-nutanix
 baseDomain: example.com
 platform:
   nutanix:
     credentialsSecretRef:
       name: nutanix-creds
     prismCentral:
       address: cluster-nutanix.prism-central.nutanix.com
       port: 9440
 provisioning:
   installConfigSecretRef:
     name: cluster-nutanix-install-config
   imageSetRef:
     name: cluster-nutanix-image-set
 pullSecretRef:
   name: pull-secret
****

Secrets

  • nutanix-creds.yaml
apiVersion: v1
data:
 password: <password>
 username: <username>
kind: Secret
metadata:
 name: nutanix-creds
 namespace: hive
  • install-config.yaml
apiVersion: v1
baseDomain: example.com
compute:
- name: worker
controlPlane:
 name: master
metadata:
 name: cluster-nutanix
platform:
 nutanix:
   apiVIPs:
     - 10.0.0.123
   ingressVIPs:
     - 10.0.0.124
   prismCentral:
     endpoint:
       address: cluster-nutanix.prism-central.nutanix.com
       port: 9440
   prismElements: 
     - endpoint:
         address: cluster-nutanix.prism-element.nutanix.com
         port: 9440
       uuid: 0005de05-75a3-dacb-ba00-2c5da2ac4c1a
       name: "NAME"
   subnetUUIDs:
     -  0005de05-75a3-dacb-ba00-123456789012
   failureDomains:
     - name: "LD Name"
       subnetUUIDs:
         -  0005de05-75a3-dacb-ba00-123456789012
       prismElements: 
         - endpoint:
             address: cluster-nutanix.prism-element.nutanix.com
             port: 9440
           uuid: 0005de05-75a3-dacb-ba00-2c5da2ac4c1a
           name: "NAME"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@codecov
Copy link

codecov bot commented Feb 19, 2025

Codecov Report

Attention: Patch coverage is 55.94714% with 300 lines in your changes missing coverage. Please review.

Project coverage is 50.27%. Comparing base (be4e5d0) to head (2e1979e).
Report is 23 commits behind head on master.

Files with missing lines Patch % Lines
contrib/pkg/createcluster/nutanix.go 0.00% 94 Missing ⚠️
contrib/pkg/deprovision/nutanix.go 0.00% 52 Missing ⚠️
pkg/installmanager/installmanager.go 47.29% 32 Missing and 7 partials ⚠️
pkg/install/generate.go 0.00% 28 Missing ⚠️
pkg/controller/machinepool/nutanixactuator.go 82.25% 20 Missing and 2 partials ⚠️
contrib/pkg/createcluster/create.go 0.00% 15 Missing ⚠️
...g/controller/machinepool/machinepool_controller.go 0.00% 14 Missing ⚠️
contrib/pkg/utils/nutanix/nutanix.go 0.00% 10 Missing ⚠️
contrib/pkg/utils/generic.go 61.90% 4 Missing and 4 partials ⚠️
.../clusterdeployment/clusterdeployment_controller.go 22.22% 7 Missing ⚠️
... and 4 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2573      +/-   ##
==========================================
+ Coverage   50.17%   50.27%   +0.10%     
==========================================
  Files         281      287       +6     
  Lines       33323    33949     +626     
==========================================
+ Hits        16719    17069     +350     
- Misses      15264    15531     +267     
- Partials     1340     1349       +9     
Files with missing lines Coverage Δ
pkg/clusterresource/nutanix.go 100.00% <100.00%> (ø)
pkg/constants/constants.go 100.00% <ø> (ø)
...oller/clusterdeployment/installconfigvalidation.go 100.00% <100.00%> (ø)
...g/controller/clusterpool/clusterpool_controller.go 58.04% <ø> (ø)
.../v1/clusterdeployment_validating_admission_hook.go 85.88% <100.00%> (+0.25%) ⬆️
...shift/hive/apis/hive/v1/clusterdeployment_types.go 0.00% <ø> (ø)
...hift/hive/apis/hive/v1/clusterdeprovision_types.go 0.00% <ø> (ø)
...m/openshift/hive/apis/hive/v1/machinepool_types.go 0.00% <ø> (ø)
contrib/pkg/deprovision/deprovision.go 0.00% <0.00%> (ø)
pkg/controller/utils/credentials.go 0.00% <0.00%> (ø)
... and 12 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@2uasimojo
Copy link
Member

Going to get started on this.

One question about the install-config: I noticed the prismElements and subnetUUIDs are duplicated at the top level and under failureDomains. We talked yesterday about the ClusterDeployment.Spec.Platform.Nutanix schema not containing this redundancy. I assume it's still supported via install-config, but it's not necessary, right? IOW your PoC still works if you omit the top-level copy?

@eliorerz
Copy link
Contributor Author

eliorerz commented Feb 19, 2025

Going to get started on this.

One question about the install-config: I noticed the prismElements and subnetUUIDs are duplicated at the top level and under failureDomains. We talked yesterday about the ClusterDeployment.Spec.Platform.Nutanix schema not containing this redundancy. I assume it's still supported via install-config, but it's not necessary, right? IOW your PoC still works if you omit the top-level copy?

Thanks, Regarding your question, the problem is that PrismElements seems to be mandatory in the install install-config while the failureDomains is optional. In any case you want to define failureDomains you have to also set at least PrismElements[0].UUID.

@2uasimojo
Copy link
Member

/test e2e e2e-pool

Weird flakery probably due to some upstream bug that seems to be fixed now.

Copy link
Member

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! With a couple of minor tweaks, I think it's ready to go. (Except that, as we discussed, we're going to need to clone the appropriate install-config fields into cd.Spec.Platform.Nutanix in anticipation of MachinePools before we can actually "release" :( )

Reminders:

  • We're going to need doc updates.
  • Let's not forget to look into ClusterPools (will probably need to use Inventory). Note that the work required for ClusterPools overlaps quite a bit with that needed for hiveutil create-cluster, although the latter is optional as we've discussed.

)

// ConfigureCreds loads secrets designated by the environment variables CLUSTERDEPLOYMENT_NAMESPACE,
// CREDS_SECRET_NAME, and CERTS_SECRET_NAME and configures Nutanix credential environment variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using certs currently. Does the PC connection require/support certs? Probably want to check upstream whether we need to include that support right away. (In which case we'll want the certs in the CD as well -- see e.g. vsphere.)

Copy link
Contributor Author

@eliorerz eliorerz Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it in my previous PR but I removed it for now. I'm still waiting for an answer on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the last pending issue on this PR. It seems like we don't need it to move forward, given that you've been able to deploy successfully as written. So, assuming you haven't yet gotten your answer, we can change this comment to a TODO: certs? and defer to a subsequent PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to either add certs options or make the comment agree with the functionality.

@eliorerz eliorerz force-pushed the HIVE-2777-Implement-Hive-Nutanix-Provisioning branch from c68f669 to c092981 Compare February 20, 2025 16:47
@eliorerz eliorerz force-pushed the HIVE-2777-Implement-Hive-Nutanix-Provisioning branch from c092981 to c17e725 Compare February 24, 2025 13:04
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 25, 2025
@eliorerz eliorerz force-pushed the HIVE-2777-Implement-Hive-Nutanix-Provisioning branch from bd300f9 to 3908c64 Compare February 25, 2025 21:24
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 25, 2025
@eliorerz eliorerz closed this Feb 27, 2025
@eliorerz eliorerz force-pushed the HIVE-2777-Implement-Hive-Nutanix-Provisioning branch from b9b80e8 to 66218d8 Compare February 27, 2025 21:00
@eliorerz eliorerz force-pushed the HIVE-2777-Implement-Hive-Nutanix-Provisioning branch from 5ed8108 to 4eeb0f8 Compare April 29, 2025 16:11
eliorerz added 10 commits April 29, 2025 21:30
HIVE-2779: Add Nutanix hiveutil support.

# Conflicts:
#	apis/go.mod
Inject certificate from platform.certificatesSecretRef to install-config AdditionalTrustBundle (if not exist)
Add a setUnsupportedConfigurationCondition helper to bubble user-facing errors in the Nutanix actuator.
When data disk discovery or RHCOS image retrieval fails, the MachinePool status is updated instead of returning an error.
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2025
Copy link
Member

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready to go... after the rebase 😇

Comment on lines +87 to +90
if updateErr := a.setUnsupportedConfigurationCondition(pool, logger, "InvalidDataDisk", "data source must specify a UUID"); updateErr != nil {
return nil, false, updateErr
}
return nil, false, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, this could be pared down to

Suggested change
if updateErr := a.setUnsupportedConfigurationCondition(pool, logger, "InvalidDataDisk", "data source must specify a UUID"); updateErr != nil {
return nil, false, updateErr
}
return nil, false, nil
return nil, false, a.setUnsupportedConfigurationCondition(pool, logger, "InvalidDataDisk", "data source must specify a UUID")

Similar below. But not really worth fixing.

@eliorerz eliorerz force-pushed the HIVE-2777-Implement-Hive-Nutanix-Provisioning branch from 4eeb0f8 to bb2de74 Compare April 30, 2025 05:28
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2025
@2uasimojo
Copy link
Member

/override ci/prow/e2e-vsphere

Known not working yet (see #2541).

/lgtm

Thanks for the patient hard work on this @eliorerz!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 30, 2025

@2uasimojo: Overrode contexts on behalf of 2uasimojo: ci/prow/e2e-vsphere

Details

In response to this:

/override ci/prow/e2e-vsphere

Known not working yet (see #2541).

/lgtm

Thanks for the patient hard work on this @eliorerz!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 30, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, eliorerz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2025
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD aaacfde and 2 for PR HEAD 2e1979e in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 24d8ccb and 2 for PR HEAD 2e1979e in total

1 similar comment
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 24d8ccb and 2 for PR HEAD 2e1979e in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 1, 2025

@eliorerz: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-419 4eeb0f8 link true /test e2e-azure-419

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@2uasimojo
Copy link
Member

@2uasimojo: Overrode contexts on behalf of 2uasimojo: ci/prow/e2e-vsphere

Yeah? Well... do it again.

/override ci/prow/e2e-vsphere

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 1, 2025

@2uasimojo: Overrode contexts on behalf of 2uasimojo: ci/prow/e2e-vsphere

Details

In response to this:

@2uasimojo: Overrode contexts on behalf of 2uasimojo: ci/prow/e2e-vsphere

Yeah? Well... do it again.

/override ci/prow/e2e-vsphere

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit d03ed3f into openshift:master May 1, 2025
13 checks passed
@eliorerz eliorerz deleted the HIVE-2777-Implement-Hive-Nutanix-Provisioning branch May 14, 2025 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants