-
Notifications
You must be signed in to change notification settings - Fork 24
Adding technical review doc #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
Signed-off-by: Jian Qiu <[email protected]>
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,11 +1,11 @@ | ||
| # General Technical Review - [Project Name] / [Level] | ||
| # General Technical Review - Open Cluster Management / Sandbox | ||
|
|
||
| - **Project:** Open Cluster Management | ||
| - **Project Version:** v1.0.0 | ||
| - **Website:** https://open-cluster-management.io/ | ||
| - **Date Updated:** 2025-8-7 | ||
| - **Template Version:** v1.0 | ||
| - **Description:** A lightweight and extensible multiple kubernetes cluster management tool | ||
| - **Description:** A lightweight and extensible multi-cluster Kubernetes management tool | ||
|
|
||
| ## Day 0 - Planning Phase | ||
|
|
||
|
|
@@ -25,29 +25,29 @@ | |
| * Describe the target persona or user(s) for the project? | ||
|
|
||
| The target users of the project are those who have multiple Kubernetes clusters and want to manage them easily, | ||
| and also those who provide kubernetes cluster management platform | ||
| and also those who provide Kubernetes cluster management platforms | ||
|
|
||
| * Explain the primary use case for the project. What additional use cases are supported by the project? | ||
|
|
||
| 1. Admin is able to manage and monitor multiple kubernetes cluster in a centralized control plane | ||
| 1. Administrators are able to manage and monitor multiple Kubernetes clusters in a centralized control plane | ||
| 2. Users are able to deploy their workload across multiple clusters. | ||
| 3. Users are able to define a cluster selection criteria to deploy different workloads. | ||
| 4. Users are able to easily extend the control plane by adding more management functionality across multiple cluster. | ||
| 4. Users are able to easily extend the control plane by adding more management functionality across multiple clusters. | ||
|
|
||
| * Explain which use cases have been identified as unsupported by the project. | ||
|
|
||
| Provisioning/lifecycling the kubernetes cluster is not the scope of this project. | ||
| Provisioning/lifecycle management of Kubernetes clusters is not the scope of this project. | ||
|
|
||
| * Describe the intended types of organizations who would benefit from adopting this project. (i.e. financial services, any software manufacturer, organizations providing platform engineering services)? | ||
|
|
||
| - Entities, e.g. financial institutions, internet companies, with many kubernetes clusters. | ||
| - Entities, e.g. financial institutions, internet companies, with many Kubernetes clusters. | ||
| - Vendors that provide platform engineering services. | ||
|
|
||
| * Please describe any completed end user research and link to any reports. | ||
|
|
||
| - AppsCode held a webinar on "Managing Many Clusters using Open Cluster Management" on 15th June 2023. | ||
| https://appscode.com/blog/post/monthly-review-june-2023/#managing-many-clusters-using-open-cluster-management | ||
| - Alibaba publishes a document on their user experience on kubefed and why move to ocm on Sep 22, 2021. | ||
| - Alibaba published a document on their user experience with kubefed and why they moved to OCM on Sep 22, 2021. | ||
| https://cloudnativenow.com/features/the-next-kubernetes-frontier-multicluster-management/ | ||
|
|
||
| ### Usability | ||
|
|
@@ -70,7 +70,7 @@ | |
| ArgoCD by deploying an agent addon to managed clusters, enabling automated GitOps-based application synchronization and management. | ||
| - [Kueue](https://github.com/open-cluster-management-io/addon-contrib/tree/main/kueue-addon): OCM integrates Kueue by installing | ||
| a scheduler addon on managed clusters, providing unified batch workload scheduling and resource management across clusters. | ||
| -[Fluid](https://github.com/open-cluster-management-io/addon-contrib/tree/main/fluid-addon): OCM integrates Fluid by deploying | ||
| - [Fluid](https://github.com/open-cluster-management-io/addon-contrib/tree/main/fluid-addon): OCM integrates Fluid by deploying | ||
| its runtime via an addon, enabling distributed data caching and acceleration capabilities in managed clusters. | ||
| - [Open-Telemetry](https://github.com/open-cluster-management-io/addon-contrib/tree/main/open-telemetry-addon): OCM | ||
| integrates Open-Telemetry by deploying its operator through an addon, allowing centralized observability and telemetry data | ||
|
|
@@ -83,9 +83,17 @@ | |
| - hub-spoke architecture: the hub component is lightweight to let users set instructions to the spoke via CRD mechanism, | ||
| and the spoke agent acts based on the hub’s instructions. | ||
| - Extensible: another design principle of the projects is to keep the core components as simple and lightweight as possible, | ||
| but with extension point to easily adding customized functionality. | ||
| but with extension points to easily add customized functionality. | ||
|
|
||
| * Outline or link to the project’s architecture requirements? Describe how they differ for Proof of Concept, Development, Test and Production environments, as applicable. | ||
| * Outline or link to the project's architecture requirements? Describe how they differ for Proof of Concept, Development, Test and Production environments, as applicable. | ||
|
|
||
| The OCM architecture consists of a hub-spoke model documented at https://open-cluster-management.io/docs/concepts/architecture/. | ||
|
|
||
| For different environments: | ||
| - **Proof of Concept**: Single hub cluster with 1-3 spoke clusters, minimal resource allocation (2 CPU, 4GB RAM per hub component) | ||
| - **Development**: Similar to PoC but with additional development addons and potentially multiple hub clusters for testing | ||
| - **Test**: Multi-hub setup with various addon configurations to test different scenarios and upgrade paths | ||
| - **Production**: High availability hub clusters with proper resource allocation, backup/restore procedures, and monitoring across potentially hundreds of managed clusters | ||
|
|
||
| * Define any specific service dependencies the project relies on in the cluster. | ||
|
|
||
|
|
@@ -102,12 +110,12 @@ | |
|
|
||
| * Describe any compliance requirements addressed by the project. | ||
|
|
||
| OCM has a policy add-on component that can integrate with Open policy agent or kyverno to manage the compliance | ||
| policy across multiple clusters. Details is documented here https://open-cluster-management.io/docs/getting-started/integration/policy-controllers/ | ||
| OCM has a policy add-on component that can integrate with Open Policy Agent or Kyverno to manage compliance | ||
| policies across multiple clusters. Details are documented here https://open-cluster-management.io/docs/getting-started/integration/policy-controllers/ | ||
|
|
||
| * Describe the project’s High Availability requirements. | ||
|
|
||
| OCM controlplane is based on the kubernetes controlplane. The controller of OCM hub and agent on the spoke cluster | ||
| OCM control plane is based on the Kubernetes control plane. The controller of OCM hub and agent on the spoke cluster | ||
| can run in multiple replicas with leader election. | ||
|
|
||
| In addition, there is also a requirement that OCM hub clusters can be recovered upon disaster, which needs API | ||
|
|
@@ -118,7 +126,7 @@ | |
|
|
||
| OCM has controllers on the hub cluster, and agents on spoke clusters. Each has its own CPU/memory requirements. | ||
| The CPU/memory of the hub and agent can be set in ClusterManager/Klusterlet API or using clusteradm. | ||
| OCM requires the spoke cluster is able to reach to the apiserver of the hub cluster directly or via http proxy. | ||
| OCM requires the spoke cluster to be able to reach the API server of the hub cluster directly or via HTTP proxy. | ||
|
|
||
| * Describe the project’s storage requirements, including its use of ephemeral and/or persistent storage. | ||
|
|
||
|
|
@@ -134,22 +142,45 @@ | |
| - Addon related: managedclusteraddon, clustermanagementaddon, addontemplate. | ||
| - And operator API (clustermanager and klusterlet) to manage components in OCM. | ||
|
|
||
| The API design follow the API convention defined https://github.com/open-cluster-management-io/api/blob/main/docs/api-conventions.md | ||
| The API design follows the API conventions defined at https://github.com/open-cluster-management-io/api/blob/main/docs/api-conventions.md | ||
|
|
||
| * Describe the project defaults | ||
|
|
||
| OCM defaults include secure hub-spoke communication via mTLS, least-privilege RBAC, and automatic certificate rotation with 24-hour expiry. | ||
| Default resource limits are set conservatively for hub components (500m CPU, 2Gi memory) and can be customized via ClusterManager/Klusterlet APIs. | ||
|
|
||
| * Outline any additional configurations from default to make reasonable use of the project | ||
|
|
||
| For production use, administrators should: | ||
| - Configure high availability with multiple replicas for hub components | ||
| - Set up proper resource requests/limits based on cluster scale | ||
| - Enable addon frameworks for policy management, observability, and application lifecycle management | ||
| - Configure backup/restore procedures for disaster recovery scenarios | ||
|
|
||
| * Describe the project defaults | ||
| * Outline any additional configurations from default to make reasonable use of the project | ||
| * Describe any new or changed API types and calls \- including to cloud providers \- that will result from this project | ||
| being enabled and used | ||
| * Describe compatibility of any new or changed APIs with API servers, including the Kubernetes API server | ||
| being enabled and used | ||
|
|
||
| OCM introduces several CRDs in the cluster.open-cluster-management.io API group: | ||
| - ManagedCluster, ManagedClusterSet for cluster lifecycle | ||
| - ManifestWork for resource deployment to managed clusters | ||
| - Placement, PlacementDecision for cluster selection and scheduling | ||
| - ManagedClusterAddon, ClusterManagementAddon for addon lifecycle | ||
| OCM does not make direct calls to cloud providers - it operates through standard Kubernetes APIs. | ||
|
|
||
| * Describe compatibility of any new or changed APIs with API servers, including the Kubernetes API server | ||
|
|
||
| All OCM APIs are implemented as standard Kubernetes CRDs, ensuring full compatibility with any conformant Kubernetes API server. | ||
| We maintain compatibility with Kubernetes versions from 1.24+ and test against multiple Kubernetes distributions. | ||
|
|
||
| * Describe versioning of any new or changed APIs, including how breaking changes are handled | ||
| A new or changed API would introduce an api version upgrade which would need api migration taking more than 1 release. | ||
| The document https://github.com/open-cluster-management-io/api/blob/main/docs/development.md#api-upgrade-flow describe | ||
| the general flow we follow for API upgrade. | ||
|
|
||
| A new or changed API would introduce an API version upgrade which would need API migration taking more than 1 release. | ||
| The document https://github.com/open-cluster-management-io/api/blob/main/docs/development.md#api-upgrade-flow describes | ||
| the general flow we follow for API upgrades. | ||
|
|
||
| * Describe the project’s release processes, including major, minor and patch releases. | ||
|
|
||
| Release process is defined here https://github.com/open-cluster-management-io/community/blob/main/RELEASE.md | ||
| The release process is defined here https://github.com/open-cluster-management-io/community/blob/main/RELEASE.md | ||
|
|
||
| ### Installation | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could link the installation guide or Getting Started page here as well or not?: https://open-cluster-management.io/docs/getting-started/quick-start/
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated |
||
|
|
||
|
|
@@ -178,7 +209,7 @@ | |
| helm install | ||
| ``` | ||
|
|
||
| * Insatll cluster manager | ||
| * Install cluster manager | ||
|
|
||
| ``` | ||
| helm install cluster-manager --version <version> ocm/cluster-manager --namespace=open-cluster-management --create-namespace | ||
|
|
@@ -267,8 +298,8 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL | |
| * How do you recommend users alter security defaults in order to "loosen" the security of the project? Please link to any documentation the project has written concerning these use cases. | ||
|
|
||
| We do not recommend or document methods to "loosen" the security of the project, as our defaults are designed to be secure. However, | ||
| OCM is highly configurable; in some cases, a user needs to grant Klusterlet agent permissions to manage their resources. We recommend users | ||
| only grant permissions with the least privilege, referencing the doc permission setting for work agent, but for purposes like testing, users | ||
| OCM is highly configurable; in some cases, users need to grant Klusterlet agent permissions to manage their resources. We recommend users | ||
| only grant permissions with the least privilege, referencing the documentation permission setting for work agent. For purposes like testing, users | ||
| can grant sufficient privileges to the Klusterlet agent that could intentionally create a less secure configuration. | ||
|
|
||
| These actions require deliberate and explicit configuration by a cluster administrator. Our documentation focuses on how to configure | ||
|
|
@@ -288,10 +319,10 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL | |
|
|
||
| * Describe how the project has evaluated which features will be a security risk to users if they are not maintained by the project? | ||
|
|
||
| The ManifestWork API feature. It was designed to dispatch/manage Kubernetes resources on the managed cluster, since it can dispatch/manage any resources, | ||
| the work-agnet might need wide permission for the managed clusters. To mitigate the risk, we ensured that: | ||
| - The agent on the spoke cluster to apply the manifests has the admin permission, instead of the cluster-admin, so that it can apply most Kubernetes resources. | ||
| - For some specific resources, like some CustomResourceDefinition, users need to explicitly grant the permission to the work agent referencing the doc permission setting for work agent | ||
| The ManifestWork API feature was designed to dispatch/manage Kubernetes resources on the managed cluster. Since it can dispatch/manage any resources, | ||
| the work-agent might need wide permissions for the managed clusters. To mitigate the risk, we ensured that: | ||
| - The agent on the spoke cluster to apply the manifests has admin permission, instead of cluster-admin, so that it can apply most Kubernetes resources. | ||
| - For some specific resources, like some CustomResourceDefinitions, users need to explicitly grant permission to the work agent referencing the documentation permission setting for work agent | ||
| - Users can delegate manifest application to a specific identity on the spoke cluster, further sandboxing the operation, see dynamic identity authorization. | ||
|
|
||
| * Cloud Native Threat Modeling | ||
|
|
@@ -340,7 +371,7 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL | |
| All charts are uploaded to https://artifacthub.io/packages/search?org=open-cluster-management&sort=relevance&page=1, which provides the community with a trusted, versioned, and verifiable source to deploy the OCM components, | ||
| ensuring they are using official project artifacts. | ||
|
|
||
| ## Day 1 \- Installation and Deployment Phase | ||
| ## Day 1 - Installation and Deployment Phase | ||
|
|
||
| ### Project Installation and Configuration | ||
|
|
||
|
|
@@ -353,8 +384,8 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL | |
|
|
||
| * How can this project be enabled or disabled in a live cluster? Please describe any downtime required of the control plane or nodes. | ||
|
|
||
| Open Cluster Management is a foundational component, and it can not be enabled/disabled in a live cluster. | ||
| User can only cut off the connection of hub and spoke, follow this doc | ||
| Open Cluster Management is a foundational component, and it cannot be enabled/disabled in a live cluster. | ||
| Users can only cut off the connection between hub and spoke by following this doc | ||
| https://open-cluster-management.io/docs/concepts/cluster-inventory/managedcluster/#cluster-removal. No downtime required. | ||
|
|
||
| * Describe how enabling the project changes any default behavior of the cluster or running workloads. | ||
|
|
@@ -367,8 +398,8 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL | |
|
|
||
| * How does the project clean up any resources created, including CRDs? | ||
|
|
||
| Open Cluster Management cleans up the resources when the managed cluster is deleted, | ||
| the process is described in: https://open-cluster-management.io/docs/getting-started/installation/register-a-cluster/#detach-the-cluster-from-hub | ||
| Open Cluster Management cleans up resources when the managed cluster is deleted. | ||
| The process is described at: https://open-cluster-management.io/docs/getting-started/installation/register-a-cluster/#detach-the-cluster-from-hub | ||
|
|
||
| ### Rollout, Upgrade and Rollback Planning | ||
|
|
||
|
|
@@ -378,28 +409,28 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL | |
|
|
||
| * Describe how the project handles rollback procedures. | ||
|
|
||
| OCM handles the rollback the same as upgrades, but need to specify a lower version https://open-cluster-management.io/docs/getting-started/administration/upgrading/ . | ||
| OCM handles rollbacks the same as upgrades, but needs to specify a lower version: https://open-cluster-management.io/docs/getting-started/administration/upgrading/ | ||
|
|
||
| * How can a rollout or rollback fail? Describe any impact to already running workloads. | ||
|
|
||
| If a rollout or rollback fails, the spoke cluster may lose connection with the hub, and can not be managed anymore. The already running | ||
| workloads will keep running, but will not be managed by the hub and status will not be reported back to the hub. | ||
| If a rollout or rollback fails, the spoke cluster may lose connection with the hub and cannot be managed anymore. The already running | ||
| workloads will keep running, but will not be managed by the hub and their status will not be reported back to the hub. | ||
|
|
||
| * Describe any specific metrics that should inform a rollback. | ||
|
|
||
| Clusteradm provides command to check the cluster info after upgrade and rollback. https://open-cluster-management.io/docs/getting-started/administration/upgrading/ | ||
| The managedcluster status will also reflect whether a cluster is available or not after the upgrade. | ||
| Clusteradm provides commands to check the cluster info after upgrade and rollback: https://open-cluster-management.io/docs/getting-started/administration/upgrading/ | ||
| The ManagedCluster status will also reflect whether a cluster is available or not after the upgrade. | ||
|
|
||
| * Explain how upgrades and rollbacks were tested and how the upgrade-\>downgrade-\>upgrade path was tested. | ||
|
|
||
| The user should run `clusteradm upgrade` on the test environment before upgrading in the product environment. | ||
| Users should run `clusteradm upgrade` on the test environment before upgrading in the production environment. | ||
|
|
||
| * Explain how the project informs users of deprecations and removals of features and APIs. | ||
|
|
||
| We will log issues in the community for the features/API deprecations and removals plan, add it to the roadmap, and inform users in the community meeting and Slack channel as well. | ||
| We will log issues in the community for features/API deprecations and removal plans, add them to the roadmap, and inform users in community meetings and Slack channels as well. | ||
|
|
||
| * Explain how the project permits utilization of alpha and beta capabilities as part of a rollout. | ||
|
|
||
| The project should follow the API upgrade flow https://github.com/open-cluster-management-io/api/blob/main/docs/development.md#api-upgrade-flow to rollout from alpha to beta. | ||
| Feature gates alpha to beta follow a standard lifecycle | ||
| The project follows the API upgrade flow https://github.com/open-cluster-management-io/api/blob/main/docs/development.md#api-upgrade-flow to rollout from alpha to beta. | ||
| Feature gates from alpha to beta follow a standard lifecycle: | ||
| https://open-cluster-management.io/docs/getting-started/administration/featuregates/ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed that you skipped Day 2 questions from GTR. Is that because you are applying for Incubation? Or are those questions totally not relevant to your level? If you know the answers for Day 2 questions, I would suggest to include it as well (at least those that are relevant), even when you are only reaching the incubation level just now. It can be helpful later.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I am trying to add Day 2 part. Some is marked as TBD since it is not clear to me how to answer or it has not been done yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's mentioned in the template that if it's not applicable, you can put N/A. If it might apply to OCM, but you don't know/not sure, we can raise the question in the #maintainers-circle CNCF slack channel. I think it might help other projects in the future as well.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated. PTAL again |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is OCM tied to any specific market segment? If not, it would be good to mention it here. Maybe we can link here adopters.md file as an example, as you also mention their use cases there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.