Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions cmd/start.go
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ func startControllers(ctx *controllerContext) error {
componentNamespace, componentName,
rootOpts.releaseImage,
ctx.InformerFactory.Clusterversion().V1().CVOConfigs(),
ctx.InformerFactory.Operatorstatus().V1().OperatorStatuses(),
ctx.InformerFactory.Operatorstatus().V1().ClusterOperators(),
ctx.APIExtInformerFactory.Apiextensions().V1beta1().CustomResourceDefinitions(),
ctx.KubeInformerFactory.Apps().V1().Deployments(),
ctx.ClientBuilder.RestConfig(),
Expand All @@ -245,7 +245,7 @@ func startControllers(ctx *controllerContext) error {
go autoupdate.New(
componentNamespace, componentName,
ctx.InformerFactory.Clusterversion().V1().CVOConfigs(),
ctx.InformerFactory.Operatorstatus().V1().OperatorStatuses(),
ctx.InformerFactory.Operatorstatus().V1().ClusterOperators(),
ctx.ClientBuilder.ClientOrDie(componentName),
ctx.ClientBuilder.KubeClientOrDie(componentName),
).Run(2, ctx.Stop)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ spec:
storage: true
# either Namespaced or Cluster
scope: Namespaced
subresources:
Copy link
Contributor

@smarterclayton smarterclayton Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The larger implication here is a formalization of a more rigid hierarchy of "truth" for cluster configuration:

  1. top level configuration (tectonic config -> ImageConfig, NetworkConfig, CloudConfig) representing cluster level truths in functional/component terms rather than operator terms
  2. secondary input configurations like secrets or raw config, or lower level component configuration (the pull secret, the cloud provider secret, or the openshift-apiserver config objects)
  3. the CVOConfig -> ClusterVersion object (discussed previously), which is the source of truth for the desired version state of the cluster
  4. specific API objects for specific operators like ClusterIngress, all the network SDN config, network policy, etc

Those are the configuration inputs to a cluster. There is a parallel hierarchy of operators:

  1. Cluster Version Operator -> ensures all "Cluster Operators" (I like this name better in colloquial use the more I use it) are functional and up to date, based on Cluster Version
  2. Second-Level-Operators -> Cluster Operators are responsible for materializing the config hierarchy at a cluster scope
  3. All cluster operators are expected to report operatorstatus/operator.status, which is how an administrator / developer performs quick triage of the state of the cluster

By making this change we are implying that an operator is a controllable object that has state that is reconciled with the cluster, and that the controllable object is the common interface to "cluster operators". I think that having a consistent set of control knobs for operators is the right thing to do (there are zero today, but the first one would be something like "paused"), rather than having to duck type control knobs on config (since the config hierarchy is not operator focused but functional areas focused).

All "cluster operators" should report status. It should be possible to list all cluster operators. It should be possible to build a UI that summarizes the rough relationships between cluster operators (although in most cases that should not be necessary).

It should be the job of the CVO to summarize operator status underneath the state of an upgrade. Ultimately this is the level of truth:

  1. The CVO takes a ClusterVersion and ensures all cluster operators are up to date
  2. The CVO summarizes the result of that operation and ensures ClusterVersion reports it
  3. An admin or user interfaces with ClusterVersion to effect changes and see their outcome.
  4. A developer or support person looks at operator status.
  5. If an operator fails to update, there should be a condition on ClusterVersion which indicates that updates are not progressing.
  6. The CVO should be reporting metrics for consumption by telemetry for input to SRE about the state of the update - some of those metrics will be summarizations of the current state like "upgrade to 3.1.5 in progress". Others will be "failing operator status".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the CVO is a reconciler, I think it's appropriate to talk about CVOConfig/ClusterVersion having a spec "desired outcome" and a status "actual state". The state the CVO might present to an end user via ClusterVersion would be different than the status passed to OperatorStatus/Operator.status - both are for humans, but one reflects the desired action in cluster version, and the other reflects the desired state the operators themselves are trying to achieve.

Copy link

@ericavonb ericavonb Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you familiar with how OLM works? It would be much nicer if there was a united view of operators, and OLM already figured a lot of this out already (like having a UI 🙂).

By making this change we are implying that an operator is a controllable object that has state that is reconciled with the cluster, and that the controllable object is the common interface to "cluster operators".

It's called the "ClusterServiceVersion" in OLM.
The operators even report back the status of their components: https://github.com/operator-framework/operator-lifecycle-manager/blob/d99daea0086cf92976617dd9326d0ed2b7ba85cf/pkg/api/apis/operators/v1alpha1/clusterserviceversion_types.go#L2

I'm not saying OLM solved all of the pieces needed for bootstrapping, but quite a bit of the stuff being discussed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. These are going to be a deliberate subset of what an OLM operator may support. The conditions on CSV status are very relevant to CV. The spec is not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest difference is that CSV needs conditions, and it's possible that we'll want to drive a phase for CV or something similar to simplify user input (although I'm leaning towards not having a phase and instead displaying human readable status instead).

# enable spec/status
status: {}
names:
# plural name to be used in the URL: /apis/<group>/<version>/<plural>
plural: cvoconfigs
Expand Down
20 changes: 10 additions & 10 deletions lib/resourceapply/cv.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ import (
"k8s.io/utils/pointer"
)

func ApplyOperatorStatus(client osclientv1.OperatorStatusesGetter, required *osv1.OperatorStatus) (*osv1.OperatorStatus, bool, error) {
if required.Extension.Raw != nil && required.Extension.Object != nil {
func ApplyOperatorStatus(client osclientv1.ClusterOperatorsGetter, required *osv1.ClusterOperator) (*osv1.ClusterOperator, bool, error) {
if required.Status.Extension.Raw != nil && required.Status.Extension.Object != nil {
return nil, false, fmt.Errorf("both extension.Raw and extension.Object should not be set")
}
existing, err := client.OperatorStatuses(required.Namespace).Get(required.Name, metav1.GetOptions{})
existing, err := client.ClusterOperators(required.Namespace).Get(required.Name, metav1.GetOptions{})
if errors.IsNotFound(err) {
actual, err := client.OperatorStatuses(required.Namespace).Create(required)
actual, err := client.ClusterOperators(required.Namespace).Create(required)
return actual, true, err
}
if err != nil {
Expand All @@ -34,17 +34,17 @@ func ApplyOperatorStatus(client osclientv1.OperatorStatusesGetter, required *osv
return existing, false, nil
}

actual, err := client.OperatorStatuses(required.Namespace).Update(existing)
actual, err := client.ClusterOperators(required.Namespace).Update(existing)
return actual, true, err
}

func ApplyOperatorStatusFromCache(lister oslistersv1.OperatorStatusLister, client osclientv1.OperatorStatusesGetter, required *osv1.OperatorStatus) (*osv1.OperatorStatus, bool, error) {
if required.Extension.Raw != nil && required.Extension.Object != nil {
func ApplyOperatorStatusFromCache(lister oslistersv1.ClusterOperatorLister, client osclientv1.ClusterOperatorsGetter, required *osv1.ClusterOperator) (*osv1.ClusterOperator, bool, error) {
if required.Status.Extension.Raw != nil && required.Status.Extension.Object != nil {
return nil, false, fmt.Errorf("both extension.Raw and extension.Object should not be set")
}
existing, err := lister.OperatorStatuses(required.Namespace).Get(required.Name)
existing, err := lister.ClusterOperators(required.Namespace).Get(required.Name)
if errors.IsNotFound(err) {
actual, err := client.OperatorStatuses(required.Namespace).Create(required)
actual, err := client.ClusterOperators(required.Namespace).Create(required)
return actual, true, err
}
if err != nil {
Expand All @@ -59,7 +59,7 @@ func ApplyOperatorStatusFromCache(lister oslistersv1.OperatorStatusLister, clien
return existing, false, nil
}

actual, err := client.OperatorStatuses(required.Namespace).Update(existing)
actual, err := client.ClusterOperators(required.Namespace).Update(existing)
return actual, true, err
}

Expand Down
81 changes: 73 additions & 8 deletions lib/resourcemerge/os.go
Original file line number Diff line number Diff line change
@@ -1,24 +1,28 @@
package resourcemerge

import (
osv1 "github.com/openshift/cluster-version-operator/pkg/apis/operatorstatus.openshift.io/v1"
"time"

"k8s.io/apimachinery/pkg/api/equality"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

osv1 "github.com/openshift/cluster-version-operator/pkg/apis/operatorstatus.openshift.io/v1"
)

func EnsureOperatorStatus(modified *bool, existing *osv1.OperatorStatus, required osv1.OperatorStatus) {
func EnsureOperatorStatus(modified *bool, existing *osv1.ClusterOperator, required osv1.ClusterOperator) {
EnsureObjectMeta(modified, &existing.ObjectMeta, required.ObjectMeta)
if !equality.Semantic.DeepEqual(existing.Condition, required.Condition) {
ensureOperatorStatusStatus(modified, &existing.Status, required.Status)
}

func ensureOperatorStatusStatus(modified *bool, existing *osv1.ClusterOperatorStatus, required osv1.ClusterOperatorStatus) {
if !equality.Semantic.DeepEqual(existing.Conditions, required.Conditions) {
*modified = true
existing.Condition = required.Condition
existing.Conditions = required.Conditions
}
if existing.Version != required.Version {
*modified = true
existing.Version = required.Version
}
if !existing.LastUpdate.Equal(&required.LastUpdate) {
*modified = true
existing.LastUpdate = required.LastUpdate
}
if !equality.Semantic.DeepEqual(existing.Extension.Raw, required.Extension.Raw) {
*modified = true
existing.Extension.Raw = required.Extension.Raw
Expand All @@ -28,3 +32,64 @@ func EnsureOperatorStatus(modified *bool, existing *osv1.OperatorStatus, require
existing.Extension.Object = required.Extension.Object
}
}

func SetOperatorStatusCondition(conditions *[]osv1.ClusterOperatorStatusCondition, newCondition osv1.ClusterOperatorStatusCondition) {
if conditions == nil {
conditions = &[]osv1.ClusterOperatorStatusCondition{}
}
existingCondition := FindOperatorStatusCondition(*conditions, newCondition.Type)
if existingCondition == nil {
newCondition.LastTransitionTime = metav1.NewTime(time.Now())
*conditions = append(*conditions, newCondition)
return
}

if existingCondition.Status != newCondition.Status {
existingCondition.Status = newCondition.Status
existingCondition.LastTransitionTime = newCondition.LastTransitionTime
}

existingCondition.Reason = newCondition.Reason
existingCondition.Message = newCondition.Message
}

func RemoveOperatorStatusCondition(conditions *[]osv1.ClusterOperatorStatusCondition, conditionType osv1.ClusterStatusConditionType) {
if conditions == nil {
conditions = &[]osv1.ClusterOperatorStatusCondition{}
}
newConditions := []osv1.ClusterOperatorStatusCondition{}
for _, condition := range *conditions {
if condition.Type != conditionType {
newConditions = append(newConditions, condition)
}
}

*conditions = newConditions
}

func FindOperatorStatusCondition(conditions []osv1.ClusterOperatorStatusCondition, conditionType osv1.ClusterStatusConditionType) *osv1.ClusterOperatorStatusCondition {
for i := range conditions {
if conditions[i].Type == conditionType {
return &conditions[i]
}
}

return nil
}

func IsOperatorStatusConditionTrue(conditions []osv1.ClusterOperatorStatusCondition, conditionType osv1.ClusterStatusConditionType) bool {
return IsOperatorStatusConditionPresentAndEqual(conditions, conditionType, osv1.ConditionTrue)
}

func IsOperatorStatusConditionFalse(conditions []osv1.ClusterOperatorStatusCondition, conditionType osv1.ClusterStatusConditionType) bool {
return IsOperatorStatusConditionPresentAndEqual(conditions, conditionType, osv1.ConditionFalse)
}

func IsOperatorStatusConditionPresentAndEqual(conditions []osv1.ClusterOperatorStatusCondition, conditionType osv1.ClusterStatusConditionType, status osv1.ConditionStatus) bool {
for _, condition := range conditions {
if condition.Type == conditionType {
return condition.Status == status
}
}
return false
}
4 changes: 2 additions & 2 deletions pkg/apis/operatorstatus.openshift.io/v1/register.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ func init() {
// Adds the list of known types to api.Scheme.
func addKnownTypes(scheme *runtime.Scheme) error {
scheme.AddKnownTypes(SchemeGroupVersion,
&OperatorStatus{},
&OperatorStatusList{},
&ClusterOperator{},
&ClusterOperatorList{},
)

metav1.AddToGroupVersion(scheme, SchemeGroupVersion)
Expand Down
108 changes: 69 additions & 39 deletions pkg/apis/operatorstatus.openshift.io/v1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,72 +5,102 @@ import (
"k8s.io/apimachinery/pkg/runtime"
)

// OperatorStatusList is a list of OperatorStatus resources.
// ClusterOperatorList is a list of OperatorStatus resources.
// +k8s:deepcopy-gen=true
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
type OperatorStatusList struct {
type ClusterOperatorList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata"`

Items []OperatorStatus `json:"items"`
Items []ClusterOperator `json:"items"`
}

// OperatorStatus is the Custom Resource object which holds the current state
// ClusterOperator is the Custom Resource object which holds the current state
// of an operator. This object is used by operators to convey their state to
// the rest of the cluster.
// +genclient
// +k8s:deepcopy-gen=true
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
type OperatorStatus struct {
type ClusterOperator struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata"`

// Condition describes the state of the operator's reconciliation
// functionality.
Condition OperatorStatusCondition `json:"condition"`
// Spec hold the intent of how this operator should behave.
Spec ClusterOperatorSpec `json:"spec"`

// Version indicates which version of the operator updated the current
// status holds the information about the state of an operator. It is consistent with status information across
// the kube ecosystem.
Status ClusterOperatorStatus `json:"status"`
}

// ClusterOperatorSpec is empty for now, but you could imagine holding information like "pause".
type ClusterOperatorSpec struct {
}

// ClusterOperatorStatus provides information about the status of the operator.
// +k8s:deepcopy-gen=true
type ClusterOperatorStatus struct {
// conditions describes the state of the operator's reconciliation functionality.
// +patchMergeKey=type
// +patchStrategy=merge
Conditions []ClusterOperatorStatusCondition `json:"conditions"`

// version indicates which version of the operator updated the current
// status object.
Version string `json:"version"`

// LasteUpdate is the time of the last update to the current status object.
LastUpdate metav1.Time `json:"lastUpdate"`

// Extension contains any additional status information specific to the
// extension contains any additional status information specific to the
// operator which owns this status object.
Extension runtime.RawExtension `json:"extension"`
Extension runtime.RawExtension `json:"extension,omitempty"`
}

// OperatorStatusCondition represents the state of the operator's
type ConditionStatus string

// These are valid condition statuses. "ConditionTrue" means a resource is in the condition.
// "ConditionFalse" means a resource is not in the condition. "ConditionUnknown" means kubernetes
// can't decide if a resource is in the condition or not. In the future, we could add other
// intermediate conditions, e.g. ConditionDegraded.
const (
ConditionTrue ConditionStatus = "True"
ConditionFalse ConditionStatus = "False"
ConditionUnknown ConditionStatus = "Unknown"
)

// ClusterOperatorStatusCondition represents the state of the operator's
// reconciliation functionality.
type OperatorStatusCondition struct {
// Type specifies the state of the operator's reconciliation functionality.
Type OperatorStatusConditionType `json:"type"`
// +k8s:deepcopy-gen=true
type ClusterOperatorStatusCondition struct {
// type specifies the state of the operator's reconciliation functionality.
Type ClusterStatusConditionType `json:"type"`

// Status of the condition, one of True, False, Unknown.
Status ConditionStatus `json:"status"`

// Message provides any additional information about the current condition.
// LastTransitionTime is the time of the last update to the current status object.
LastTransitionTime metav1.Time `json:"lastTransitionTime"`

// reason is the reason for the condition's last transition. Reasons are CamelCase
Reason string `json:"reason,omitempty"`

// message provides additional information about the current condition.
// This is only to be consumed by humans.
Message string `json:"message"`
Message string `json:"message,omitempty"`
}

// OperatorStatusConditionType is the state of the operator's reconciliation
// functionality.
type OperatorStatusConditionType string
// ClusterStatusConditionType is the state of the operator's reconciliation functionality.
type ClusterStatusConditionType string

const (
// OperatorStatusConditionTypeWaiting indicates that the operator isn't
// running its reconciliation functionality. This may be because a
// dependency or other prerequisite hasn't been satisfied.
OperatorStatusConditionTypeWaiting OperatorStatusConditionType = "Waiting"

// OperatorStatusConditionTypeWorking indicates that the operator is
// actively reconciling its operands.
OperatorStatusConditionTypeWorking OperatorStatusConditionType = "Working"

// OperatorStatusConditionTypeDone indicates that the operator has finished
// reconciling its operands and is waiting for changes.
OperatorStatusConditionTypeDone OperatorStatusConditionType = "Done"

// OperatorStatusConditionTypeDegraded indicates that the operator has
// encountered an error that is preventing it from working properly.
OperatorStatusConditionTypeDegraded OperatorStatusConditionType = "Degraded"
// OperatorAvailable indicates that the binary maintained by the operator (eg: openshift-apiserver for the
// openshift-apiserver-operator), is functional and available in the cluster.
OperatorAvailable ClusterStatusConditionType = "Available"

// OperatorProgressing indicates that the operator is actively making changes to the binary maintained by the
// operator (eg: openshift-apiserver for the openshift-apiserver-operator).
OperatorProgressing ClusterStatusConditionType = "Progressing"

// OperatorFailing indicates that the operator has encountered an error that is preventing it from working properly.
// The binary maintained by the operator (eg: openshift-apiserver for the openshift-apiserver-operator) may still be
// available, but the user intent cannot be fulfilled.
OperatorFailing ClusterStatusConditionType = "Failing"
)
Loading