Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add more contents
Signed-off-by: Jian Qiu <[email protected]>
  • Loading branch information
qiujian16 committed Aug 11, 2025
commit da21c66a0b4f3ac1810bdfb28457e5f7af25a758
123 changes: 77 additions & 46 deletions cncf/GTR.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# General Technical Review - [Project Name] / [Level]
# General Technical Review - Open Cluster Management / Sandbox

- **Project:** Open Cluster Management
- **Project Version:** v1.0.0
- **Website:** https://open-cluster-management.io/
- **Date Updated:** 2025-8-7
- **Template Version:** v1.0
- **Description:** A lightweight and extensible multiple kubernetes cluster management tool
- **Description:** A lightweight and extensible multi-cluster Kubernetes management tool

## Day 0 - Planning Phase

Expand All @@ -25,29 +25,29 @@
* Describe the target persona or user(s) for the project?

The target users of the project are those who have multiple Kubernetes clusters and want to manage them easily,
and also those who provide kubernetes cluster management platform
and also those who provide Kubernetes cluster management platforms

* Explain the primary use case for the project. What additional use cases are supported by the project?

1. Admin is able to manage and monitor multiple kubernetes cluster in a centralized control plane
1. Administrators are able to manage and monitor multiple Kubernetes clusters in a centralized control plane
2. Users are able to deploy their workload across multiple clusters.
3. Users are able to define a cluster selection criteria to deploy different workloads.
4. Users are able to easily extend the control plane by adding more management functionality across multiple cluster.
4. Users are able to easily extend the control plane by adding more management functionality across multiple clusters.

* Explain which use cases have been identified as unsupported by the project.

Provisioning/lifecycling the kubernetes cluster is not the scope of this project.
Provisioning/lifecycle management of Kubernetes clusters is not the scope of this project.

* Describe the intended types of organizations who would benefit from adopting this project. (i.e. financial services, any software manufacturer, organizations providing platform engineering services)?

- Entities, e.g. financial institutions, internet companies, with many kubernetes clusters.
- Entities, e.g. financial institutions, internet companies, with many Kubernetes clusters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is OCM tied to any specific market segment? If not, it would be good to mention it here. Maybe we can link here adopters.md file as an example, as you also mention their use cases there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

- Vendors that provide platform engineering services.

* Please describe any completed end user research and link to any reports.

- AppsCode held a webinar on "Managing Many Clusters using Open Cluster Management" on 15th June 2023.
https://appscode.com/blog/post/monthly-review-june-2023/#managing-many-clusters-using-open-cluster-management
- Alibaba publishes a document on their user experience on kubefed and why move to ocm on Sep 22, 2021.
- Alibaba published a document on their user experience with kubefed and why they moved to OCM on Sep 22, 2021.
https://cloudnativenow.com/features/the-next-kubernetes-frontier-multicluster-management/

### Usability
Expand All @@ -70,7 +70,7 @@
ArgoCD by deploying an agent addon to managed clusters, enabling automated GitOps-based application synchronization and management.
- [Kueue](https://github.com/open-cluster-management-io/addon-contrib/tree/main/kueue-addon): OCM integrates Kueue by installing
a scheduler addon on managed clusters, providing unified batch workload scheduling and resource management across clusters.
-[Fluid](https://github.com/open-cluster-management-io/addon-contrib/tree/main/fluid-addon): OCM integrates Fluid by deploying
- [Fluid](https://github.com/open-cluster-management-io/addon-contrib/tree/main/fluid-addon): OCM integrates Fluid by deploying
its runtime via an addon, enabling distributed data caching and acceleration capabilities in managed clusters.
- [Open-Telemetry](https://github.com/open-cluster-management-io/addon-contrib/tree/main/open-telemetry-addon): OCM
integrates Open-Telemetry by deploying its operator through an addon, allowing centralized observability and telemetry data
Expand All @@ -83,9 +83,17 @@
- hub-spoke architecture: the hub component is lightweight to let users set instructions to the spoke via CRD mechanism,
and the spoke agent acts based on the hub’s instructions.
- Extensible: another design principle of the projects is to keep the core components as simple and lightweight as possible,
but with extension point to easily adding customized functionality.
but with extension points to easily add customized functionality.

* Outline or link to the project’s architecture requirements? Describe how they differ for Proof of Concept, Development, Test and Production environments, as applicable.
* Outline or link to the project's architecture requirements? Describe how they differ for Proof of Concept, Development, Test and Production environments, as applicable.

The OCM architecture consists of a hub-spoke model documented at https://open-cluster-management.io/docs/concepts/architecture/.

For different environments:
- **Proof of Concept**: Single hub cluster with 1-3 spoke clusters, minimal resource allocation (2 CPU, 4GB RAM per hub component)
- **Development**: Similar to PoC but with additional development addons and potentially multiple hub clusters for testing
- **Test**: Multi-hub setup with various addon configurations to test different scenarios and upgrade paths
- **Production**: High availability hub clusters with proper resource allocation, backup/restore procedures, and monitoring across potentially hundreds of managed clusters

* Define any specific service dependencies the project relies on in the cluster.

Expand All @@ -102,12 +110,12 @@

* Describe any compliance requirements addressed by the project.

OCM has a policy add-on component that can integrate with Open policy agent or kyverno to manage the compliance
policy across multiple clusters. Details is documented here https://open-cluster-management.io/docs/getting-started/integration/policy-controllers/
OCM has a policy add-on component that can integrate with Open Policy Agent or Kyverno to manage compliance
policies across multiple clusters. Details are documented here https://open-cluster-management.io/docs/getting-started/integration/policy-controllers/

* Describe the project’s High Availability requirements.

OCM controlplane is based on the kubernetes controlplane. The controller of OCM hub and agent on the spoke cluster
OCM control plane is based on the Kubernetes control plane. The controller of OCM hub and agent on the spoke cluster
can run in multiple replicas with leader election.

In addition, there is also a requirement that OCM hub clusters can be recovered upon disaster, which needs API
Expand All @@ -118,7 +126,7 @@

OCM has controllers on the hub cluster, and agents on spoke clusters. Each has its own CPU/memory requirements.
The CPU/memory of the hub and agent can be set in ClusterManager/Klusterlet API or using clusteradm.
OCM requires the spoke cluster is able to reach to the apiserver of the hub cluster directly or via http proxy.
OCM requires the spoke cluster to be able to reach the API server of the hub cluster directly or via HTTP proxy.

* Describe the project’s storage requirements, including its use of ephemeral and/or persistent storage.

Expand All @@ -134,22 +142,45 @@
- Addon related: managedclusteraddon, clustermanagementaddon, addontemplate.
- And operator API (clustermanager and klusterlet) to manage components in OCM.

The API design follow the API convention defined https://github.com/open-cluster-management-io/api/blob/main/docs/api-conventions.md
The API design follows the API conventions defined at https://github.com/open-cluster-management-io/api/blob/main/docs/api-conventions.md

* Describe the project defaults

OCM defaults include secure hub-spoke communication via mTLS, least-privilege RBAC, and automatic certificate rotation with 24-hour expiry.
Default resource limits are set conservatively for hub components (500m CPU, 2Gi memory) and can be customized via ClusterManager/Klusterlet APIs.

* Outline any additional configurations from default to make reasonable use of the project

For production use, administrators should:
- Configure high availability with multiple replicas for hub components
- Set up proper resource requests/limits based on cluster scale
- Enable addon frameworks for policy management, observability, and application lifecycle management
- Configure backup/restore procedures for disaster recovery scenarios

* Describe the project defaults
* Outline any additional configurations from default to make reasonable use of the project
* Describe any new or changed API types and calls \- including to cloud providers \- that will result from this project
being enabled and used
* Describe compatibility of any new or changed APIs with API servers, including the Kubernetes API server
being enabled and used

OCM introduces several CRDs in the cluster.open-cluster-management.io API group:
- ManagedCluster, ManagedClusterSet for cluster lifecycle
- ManifestWork for resource deployment to managed clusters
- Placement, PlacementDecision for cluster selection and scheduling
- ManagedClusterAddon, ClusterManagementAddon for addon lifecycle
OCM does not make direct calls to cloud providers - it operates through standard Kubernetes APIs.

* Describe compatibility of any new or changed APIs with API servers, including the Kubernetes API server

All OCM APIs are implemented as standard Kubernetes CRDs, ensuring full compatibility with any conformant Kubernetes API server.
We maintain compatibility with Kubernetes versions from 1.24+ and test against multiple Kubernetes distributions.

* Describe versioning of any new or changed APIs, including how breaking changes are handled
A new or changed API would introduce an api version upgrade which would need api migration taking more than 1 release.
The document https://github.com/open-cluster-management-io/api/blob/main/docs/development.md#api-upgrade-flow describe
the general flow we follow for API upgrade.

A new or changed API would introduce an API version upgrade which would need API migration taking more than 1 release.
The document https://github.com/open-cluster-management-io/api/blob/main/docs/development.md#api-upgrade-flow describes
the general flow we follow for API upgrades.

* Describe the project’s release processes, including major, minor and patch releases.

Release process is defined here https://github.com/open-cluster-management-io/community/blob/main/RELEASE.md
The release process is defined here https://github.com/open-cluster-management-io/community/blob/main/RELEASE.md

### Installation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could link the installation guide or Getting Started page here as well or not?: https://open-cluster-management.io/docs/getting-started/quick-start/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


Expand Down Expand Up @@ -178,7 +209,7 @@
helm install
```

* Insatll cluster manager
* Install cluster manager

```
helm install cluster-manager --version <version> ocm/cluster-manager --namespace=open-cluster-management --create-namespace
Expand Down Expand Up @@ -267,8 +298,8 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL
* How do you recommend users alter security defaults in order to "loosen" the security of the project? Please link to any documentation the project has written concerning these use cases.

We do not recommend or document methods to "loosen" the security of the project, as our defaults are designed to be secure. However,
OCM is highly configurable; in some cases, a user needs to grant Klusterlet agent permissions to manage their resources. We recommend users
only grant permissions with the least privilege, referencing the doc permission setting for work agent, but for purposes like testing, users
OCM is highly configurable; in some cases, users need to grant Klusterlet agent permissions to manage their resources. We recommend users
only grant permissions with the least privilege, referencing the documentation permission setting for work agent. For purposes like testing, users
can grant sufficient privileges to the Klusterlet agent that could intentionally create a less secure configuration.

These actions require deliberate and explicit configuration by a cluster administrator. Our documentation focuses on how to configure
Expand All @@ -288,10 +319,10 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL

* Describe how the project has evaluated which features will be a security risk to users if they are not maintained by the project?

The ManifestWork API feature. It was designed to dispatch/manage Kubernetes resources on the managed cluster, since it can dispatch/manage any resources,
the work-agnet might need wide permission for the managed clusters. To mitigate the risk, we ensured that:
- The agent on the spoke cluster to apply the manifests has the admin permission, instead of the cluster-admin, so that it can apply most Kubernetes resources.
- For some specific resources, like some CustomResourceDefinition, users need to explicitly grant the permission to the work agent referencing the doc permission setting for work agent
The ManifestWork API feature was designed to dispatch/manage Kubernetes resources on the managed cluster. Since it can dispatch/manage any resources,
the work-agent might need wide permissions for the managed clusters. To mitigate the risk, we ensured that:
- The agent on the spoke cluster to apply the manifests has admin permission, instead of cluster-admin, so that it can apply most Kubernetes resources.
- For some specific resources, like some CustomResourceDefinitions, users need to explicitly grant permission to the work agent referencing the documentation permission setting for work agent
- Users can delegate manifest application to a specific identity on the spoke cluster, further sandboxing the operation, see dynamic identity authorization.

* Cloud Native Threat Modeling
Expand Down Expand Up @@ -340,7 +371,7 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL
All charts are uploaded to https://artifacthub.io/packages/search?org=open-cluster-management&sort=relevance&page=1, which provides the community with a trusted, versioned, and verifiable source to deploy the OCM components,
ensuring they are using official project artifacts.

## Day 1 \- Installation and Deployment Phase
## Day 1 - Installation and Deployment Phase

### Project Installation and Configuration

Expand All @@ -353,8 +384,8 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL

* How can this project be enabled or disabled in a live cluster? Please describe any downtime required of the control plane or nodes.

Open Cluster Management is a foundational component, and it can not be enabled/disabled in a live cluster.
User can only cut off the connection of hub and spoke, follow this doc
Open Cluster Management is a foundational component, and it cannot be enabled/disabled in a live cluster.
Users can only cut off the connection between hub and spoke by following this doc
https://open-cluster-management.io/docs/concepts/cluster-inventory/managedcluster/#cluster-removal. No downtime required.

* Describe how enabling the project changes any default behavior of the cluster or running workloads.
Expand All @@ -367,8 +398,8 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL

* How does the project clean up any resources created, including CRDs?

Open Cluster Management cleans up the resources when the managed cluster is deleted,
the process is described in: https://open-cluster-management.io/docs/getting-started/installation/register-a-cluster/#detach-the-cluster-from-hub
Open Cluster Management cleans up resources when the managed cluster is deleted.
The process is described at: https://open-cluster-management.io/docs/getting-started/installation/register-a-cluster/#detach-the-cluster-from-hub

### Rollout, Upgrade and Rollback Planning

Expand All @@ -378,28 +409,28 @@ Self-assessment: https://github.com/open-cluster-management-io/ocm/blob/main/SEL

* Describe how the project handles rollback procedures.

OCM handles the rollback the same as upgrades, but need to specify a lower version https://open-cluster-management.io/docs/getting-started/administration/upgrading/ .
OCM handles rollbacks the same as upgrades, but needs to specify a lower version: https://open-cluster-management.io/docs/getting-started/administration/upgrading/

* How can a rollout or rollback fail? Describe any impact to already running workloads.

If a rollout or rollback fails, the spoke cluster may lose connection with the hub, and can not be managed anymore. The already running
workloads will keep running, but will not be managed by the hub and status will not be reported back to the hub.
If a rollout or rollback fails, the spoke cluster may lose connection with the hub and cannot be managed anymore. The already running
workloads will keep running, but will not be managed by the hub and their status will not be reported back to the hub.

* Describe any specific metrics that should inform a rollback.

Clusteradm provides command to check the cluster info after upgrade and rollback. https://open-cluster-management.io/docs/getting-started/administration/upgrading/
The managedcluster status will also reflect whether a cluster is available or not after the upgrade.
Clusteradm provides commands to check the cluster info after upgrade and rollback: https://open-cluster-management.io/docs/getting-started/administration/upgrading/
The ManagedCluster status will also reflect whether a cluster is available or not after the upgrade.

* Explain how upgrades and rollbacks were tested and how the upgrade-\>downgrade-\>upgrade path was tested.

The user should run `clusteradm upgrade` on the test environment before upgrading in the product environment.
Users should run `clusteradm upgrade` on the test environment before upgrading in the production environment.

* Explain how the project informs users of deprecations and removals of features and APIs.

We will log issues in the community for the features/API deprecations and removals plan, add it to the roadmap, and inform users in the community meeting and Slack channel as well.
We will log issues in the community for features/API deprecations and removal plans, add them to the roadmap, and inform users in community meetings and Slack channels as well.

* Explain how the project permits utilization of alpha and beta capabilities as part of a rollout.

The project should follow the API upgrade flow https://github.com/open-cluster-management-io/api/blob/main/docs/development.md#api-upgrade-flow to rollout from alpha to beta.
Feature gates alpha to beta follow a standard lifecycle
The project follows the API upgrade flow https://github.com/open-cluster-management-io/api/blob/main/docs/development.md#api-upgrade-flow to rollout from alpha to beta.
Feature gates from alpha to beta follow a standard lifecycle:
https://open-cluster-management.io/docs/getting-started/administration/featuregates/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that you skipped Day 2 questions from GTR. Is that because you are applying for Incubation? Or are those questions totally not relevant to your level? If you know the answers for Day 2 questions, I would suggest to include it as well (at least those that are relevant), even when you are only reaching the incubation level just now. It can be helpful later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I am trying to add Day 2 part. Some is marked as TBD since it is not clear to me how to answer or it has not been done yet.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's mentioned in the template that if it's not applicable, you can put N/A. If it might apply to OCM, but you don't know/not sure, we can raise the question in the #maintainers-circle CNCF slack channel. I think it might help other projects in the future as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. PTAL again