-
Notifications
You must be signed in to change notification settings - Fork 530
metallb: Use CNO as the integration point #721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Will this modify the CNO CRD or is the expectation that there will be a separate Metallb CRD that could be independently configured by an user? |
I expect the MetalLB CRDs to be used directly for configuring MetalLB. I suppose there's an option question on whether we do either:
I'm leaning toward that last one right now, but am open to other ideas. |
|
Seems reasonable. Part of me is idly thinking about making MetalLB an OLM-managed project, but requesting its deployment from the CNO in applicable cases. That is, however, probably too much complexity for no real benefit. |
Yes, this was my original thinking too. At this point though I don't think we have any other instance of an automatically installed OLM-managed operator and I don't really want to block on that. If it becomes possible to do that in the future, we could split it back out of the CNO. Would we ever? ... probably not ... but we could ... |
This change updates the metallb proposal to reflect that we will use the cluster network operator (CNO) to manage the deployment of MetalLB on OpenShift. This will provide the flexibility to turn this on by default on bare metal or related platforms and Services of type=LoadBalancer are a core feature we'd like to provide, and may even require in the future.
|
I updated the PR to reflect that we do not expect any additional CNO configuration required, and that we can automatically deploy MetalLB once the CNO sees its configuration get created. |
|
I've done some more thinking about this (namely, staring at the wall while the one-year-old contemplates the meaning of life, the universe, and porridge), and I think we should ultimately move towards a separate, upstream MetalLB operator. It's fine if this operator is installed by the CNO -- there's extensive precedent for that (e.g. Prometheus Operator). But we really want the upstream project to own the deployment and upgrade story. Burying that logic in the CNO means we're stuck maintaining that ourselves, forever. It's fine if this doesn't happen at first. But we should make that an explicit goal of this RFE. |
Its worth pointing out that the other change we want to make to MetalLB is to integrate it with FRR which is a BGP agent. This would be another component that would need to be deployed somehow. The FRR deployment should have a different lifecycle to the MetalLB deployment, as we may want to use FRR for use cases other than bare-metal load-balancing. Should this be deployed as part of CNO? Also, communication between MetalLB and FRR would, potentially, be via a Unix socket which might complicate things somewhat and doesn't feel very Kubernetes native! :S |
It is totally kubernetes-native! That's why pods consist of multiple containers :) Without getting too much in the weeds, I would expect that FRR's would support consistent integration points, whatever they may be, if the BGP implementation becomes pluggable. |
Socket communication between pods? I can see how it would be ok between containers in a pod. |
|
@squeed I agree with your desire to move the definition of deployment and upgrade upstream as much as possible. It's also true that we can use a separate operator managed by CNO and still achieve having the CNO when to have it on by default. Not managing the deployment directly in CNO is another layer of complexity, as well. It very will could be worth it if we share enough of the logic with an upstream operator. To me it seems the key is determining how closely an upstream and downstream version would be to each other. I agree with @markdgray that our management of FRR seems to be the most likely place where we will have OpenShift specific opinions. One of the reasons we desire to use FRR as the BGP backend is so that we can also use FRR on the host to satisfy other BGP use cases. Those other use cases would be out of scope for MetalLB. For upstream, I imagine we'd want to shy away from this complexity. Upstream we'll want to make using FRR just as simple as the existing built-in BGP stack. What's also tough here is we won't be implementing those follow-up FRR based customizations for a couple more releases, so it's hard to say right now what that looks like. I do know of one other case where we started with an operator (machine-api-operator) directly managing a component (cluster-api-baremetal, baremetal-operator, ironic), and later split all of that out into a single cluster-baremetal-operator. I imagine if we started directly in the CNO, we could go through the same process later if we determine it's worth it. With all that said, I'm still leaning toward starting directly in the CNO with potentially considering moving it out once we sort out how much of it is really reusable given our expected custom usage of FRR. When we reach some consensus, I'll update this PR to reflect the discussion. |
|
@russellb I think all of this depends on how complex we expect the deployment and upgrade lifecycle to be. If upstream can get away with a basic Helm chart, then there's not much value in writing an operator upstream, and the CNO templating "engine" is more than sufficient. Once you move to a model where Deployments or StatefulSets are no longer sufficient (especially around upgrades), then that is, IMHO, the line where an upstream effort should be involved. I would love to write an upstream OVN-K operator -- it would enhance community adoption, and generally get the project thinking about lifecycle. But that's another journey. |
|
@squeed It's not too complex right now. There's a single replica Deployment (the metallb |
|
@squeed based on your last comment, I think you're OK with starting with the CNO, and then persuing a separate upstream operator if the complexity grows in the future? Is that a fair summary? |
That seems reasonable. However @msherif1234 is working on an upstream operator that configures MetalLB - but does not presently manage its deployment / lifecycle. Given that we're going through the effort of writing this logic, does it make sense to just deploy MetalLB via the downstream operator now? I could imagine transitioning ownership of the downstream objects from one operator to another could be quite difficult. |
This may just be a terminology issue, but he's implementing new kubernetes controllers in the core MetalLB project for the project's configuration as an alternative to the existing ConfigMap interface for MetalLB configuration. It's not a separate component, and as you note, does not include anything for managing the deployment of MetalLB itself. I view the operator part, the controller that manages the deployment of MetalLB, as a separate component. I'm proposing here that we make it the cluster network operator (CNO), since that seems simplest overall. |
@russellb deployment of Metallb above is basically just deploying the manifests to get the controller "deployment" and the speakers "ds"running and nothing CRD related right ? which is effectively what the script below is doing |
|
@msherif1234 yes, that's right. and we need to write code that does the equivalent of that script (and then update it as necessary for upgrades). The location of that code is the subject of this discussion. |
|
After more thought and discussion, it seems nobody (myself included) is completely happy with this approach. Its best quality is that it's less work, but really doesn't seem to be the ideal technical outcome. I'm going to close this PR for now and I've posted #772 to supersede this proposal. |
|
/close |
|
@russellb: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This change updates the metallb proposal to reflect that we will use the
cluster network operator (CNO) to manage the deployment of MetalLB on
OpenShift. This will provide the flexibility to turn this on by default
on bare metal or related platforms and Services of type=LoadBalancer are
a core feature we'd like to provide, and may even require in the future.