Skip to content

Conversation

@smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Nov 16, 2018

ClusterVersion/ClusterOperator moved to the openshift/api repo, with minor
changes that are vendored back here. This is the reaction PR.

The major differences in what went upstream:

  • ClusterOperator is now cluster scoped
  • ClusterVersion deserialization doesn't check UID or URL deserialization,
    which needs to be handled in by the operator anyway (subsequent commit)
  • ClusterVersion URL is not a pointer, suggestion was to create a new field
    which controls update behavior such as updateMode: <None|Retrieve|Auto>
    where Retrieve might be the default.

The validation logic checks whether the incoming object is valid, and if it isn't sets the Invalid condition on the object. It then ensures that the config spec has any of the invalid "optional" fields cleared - this is somewhat novel, but it ensures we don't accidentally take action to move the version. An invalid object reconciles, but doesn't progress. As part of this change, simplify the initial status sync into something that feels more level driven (ensure we always have conditions populated), which makes the CVO loop a bit easier to read.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 16, 2018
@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 16, 2018
@abhinavdahiya
Copy link
Contributor

openshift/library-go#97 might come in handy.

@smarterclayton
Copy link
Contributor Author

I still need to add unit test for invalid for this. David and I talked through the conditions and how they're being used here vs other operators. We agreed that we were roughly in the same spot:

Available means either "I've reached the desired state you requested" or "my operands are healthy". In CVO right now we set available false when we start applying a new version, and set it to true after we've synced a payload successfully at least once. Arguably we don't have to reset available to false when we start applying a new version, because "Progressing = true" means the same thing. That's something I want to think about more, but Progressing = false can also be sufficient to mean I've reached the desired state and I am going to maintain that state and is probably fairly intuitive to a user when they think about upgrades (progressing = true when initially setting up or when working towards a desired update, but false while reconciling).

We did talk about how ClusterOperator could be taken into account as a prerequisite for upgrading to a new version. If not all cluster operators are Available=true && Progressing=false && Failing=false && Upgradeable!=false, we should only reconcile to whatever the current CVO version is (treat desired update as if were nil). That would prevent us from accepting a new update while an existing one is being processed. At the very end of a payload sync, we would then update status.current to be equal to the current CVO object's payload.

  1. user is at stable state at 4.0.2, all operators green, no desired update
  2. user requests 4.0.3 via desired update
  3. CVO checks operators still green before loading the payload - if not all green, reconcile current version rather than go to a new one
  4. CVO begins update loop
  5. if all updates succeed (including cluster operators going green) update status.current to be 4.0.3

We can still init status.current if it's unset, but we shouldn't update current until we've completed a sync (right now the code is updating current to the CVO at steps 3-4 when it updates status). Also need to think about what type of status report to provide at 3 - are we Progressing=false with an appropriate message?

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 20, 2018
@smarterclayton smarterclayton changed the title WIP: api: Update to objects from openshift/api api: Update to objects from openshift/api Nov 20, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 20, 2018
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 20, 2018
@smarterclayton
Copy link
Contributor Author

Updated so that this is relatively complete:

  1. CVO uses openshift/api types
  2. Early in the sync loop CVO verifies the incoming ClusterVersion and flags a condition if the object can't be handled as is (instead of erring out on it before) - that status is reported in the progressing field so that oc get clusterversion shows it
  3. Reworked how the sync loop sets initial status so that we set all of these at once
  4. If the object is invalid, the CVO only reconciles (applies its own payload), it will not update.

@smarterclayton
Copy link
Contributor Author

(to be clear, I didn't make any of the changes discussed in my wall of text comment, just added the Invalid condition and cleaned up the sync loop)

)

func EnsureClusterOperatorStatus(modified *bool, existing *osv1.ClusterOperator, required osv1.ClusterOperator) {
func EnsureClusterOperatorStatus(modified *bool, existing *configv1.ClusterOperator, required configv1.ClusterOperator) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should no longer exist in CVO. openshift/library-go#97 seems like better place for these functions. For now its okay as-is


Updates []cvv1.Update
Condition osv1.ClusterOperatorStatusCondition
Updates []configv1.Update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this configv1.Update is little ambiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I realized last night too that when we start bringing metadata back from cincinatti (messages and URLs to display pages) we're going to need to split the update. I was going to suggest ClusterVersionUpdate and AvailableClusterVersionUpdate as the two structs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ these sound better.

@abhinavdahiya
Copy link
Contributor

(to be clear, I didn't make any of the changes discussed in my wall of text comment, just added the Invalid condition and cleaned up the sync loop)

maybe an issue to make sure we don't forget?


// for fields that have meaning that are incomplete, clear them
// prevents us from loading clearly malformed payloads
obj = validation.ClearInvalidFields(obj, errs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this mutating lister cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we deep copy before making a change in that method.

@abhinavdahiya
Copy link
Contributor

This looks good. 👍

will /lgtm as soon as unit tests pass.

The CO is cluster scoped so that users can run `oc get clusteroperators`
and to make it clear it's cluster scoped.
ClusterVersion/ClusterOperator moved to the openshift/api repo, with minor
changes that are vendored back here. This is the reaction PR.

The major differences in what went upstream:

* ClusterOperator is now cluster scoped
* ClusterVersion deserialization doesn't check UID or URL deserialization, which
  needs to be handled in by the operator anyway (subsequent commit)
* ClusterVersion URL is not a pointer, suggestion was to create a new field
  which controls update behavior such as `updateMode: <None|Retrieve|Auto>` where
  Retrieve might be the default.
The CVO needs to perform some minimum sanity checking on incoming input
and communicate to users when it is blocked. The previous mechanism of
rejecting the CV on deserialization is only partial validation and doesn't
cover all scenarios. In the future we may want to have the CVO register
as an admission webhook for its own resource.

Add validation immediately after the CVO loads the object from the cache,
verifying that the object that we see has no errors. If it does, write
an Invalid condition to the status and reset the Progressing condition,
then clear the invalid fields so that the sync loop doesn't act on them.

Simplify initial status setting by having the initial status check
normalize the object status for a set of conditions, including available
updates. This reduces the complexity of the CVO main loop slightly.
@smarterclayton
Copy link
Contributor Author

I updated tests - also made the sync_test use a customizable backoff so we could have the test complete instantly (that's in the api: reaction commit because the package changed).

@abhinavdahiya
Copy link
Contributor

I updated tests - also made the sync_test use a customizable backoff so we could have the test complete instantly (that's in the api: reaction commit because the package changed).

Thanks for that, i missed that in #57 😇

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 21, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Nov 21, 2018 via email

@smarterclayton
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 4d8f469 into openshift:master Nov 21, 2018
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 16, 2019
This avoids a race between:

* The cluster-version operator pushing its internally-generated default and
* The cluster-bootstrap process pushing the installer-generated ClusterVersion into the cluster [1]

Because cluster-bootstrap does not override already-existing objects,
when the CVO's default won that race, the cluster would come up with a
bogus channel ('fast', instead of the installer's default 'stable-4.2'
etc.) and clusterID (causing the reported Telemetry to come in under a
different UUID than the one the installer provided in its
metadata.json output [2]).

By removing the default, the CVO will continue applying the current
version and reporting some metrices like
cluster_version{type="current"}.  But it will not report metrics that
depend on the state stored in the ClusterVersion object (past
versions, cluster_version{type="failure"}, etc.).  And even the
metrics it does provide may not make it up to the Telemetry API
because the monitoring operator will not be able to pass the Telemeter
container the expected clusterID [3,4].  I'm not clear on how the
Telemeter config is updated when the ClusterVersion's clusterID *does*
change [5], but the monitoring operator is willing to continue without
it [6].  The monitoring operator only updates the Telemeter Deployment
to *set* (or update) the cluster ID [7]; it never clears an existing
cluster ID.

While we won't get Telemetry out of the cluster until the
cluster-bootstrap process (or somebody else) pushes the ClusterVersion
in, we can still alert on it locally.  I haven't done that in this
commit, because [8] is in flight touching this space.

The defaulting logic dates back to the early CVOConfig work in
ea678b1 (*: read runtime parameters from CVOConfig, 2018-08-01,
ClusterVersion deletion without the default [9], but:

* Who deletes their ClusterVersion?

* Even if someone wanted to delete their ClusterVersion, we'll want to
  eventually make the CVO an admission gate for ClusterVersion
  changes, ab4d84a (cvo: Validate the CVO prior to acting on it,
  2018-11-18, openshift#55).  So (once we set up that admission gate), someone
  determined to remove ClusterVersion would have to disable that
  admission config before they could remove the ClusterVersion.  That
  seems like enough steps that you wouldn't blow ClusterVersion away
  by accident.

* If you do successfully remove your ClusterVersion, you're going to
  lose all the status and history it contained.  If you scale down
  your CVO and then remove your ClusterVersion, it's going to be hard
  to recover that lost information.  If you left the CVO running, we
  could presumably cache the whole thing in memory and push it back
  into the cluster.  That would at least avoid the "compiled-in CVO
  defaults differ from user's choices" issue.  But we'd still be
  fighting with the user over the presence of ClusterVersion, and I'd
  rather not.

* Telemetry will continue pushing reports with the last-seen
  clusterID, so unless you disabled Telemetry first, future
  "ClusterVersion is missing" alerts (not added in this commit, see
  earlier paragraph mentioning alerts) would make it out to let us
  know that something was being silly.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1741786
[2]: https://github.com/openshift/installer/blob/4e204c5e509de1bd31113b0c0e73af1a35e52c0a/pkg/types/clustermetadata.go#L17-L18
[3]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/manifests/0000_50_cluster_monitoring_operator_04-deployment.yaml#L62
[4]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/config.go#L217-L229
[5]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L369-L370
[6]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L376
[7]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/manifests.go#L1762-L1763
[8]: openshift#232
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1708697#c12
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 16, 2019
This avoids a race between:

* The cluster-version operator pushing its internally-generated default and
* The cluster-bootstrap process pushing the installer-generated ClusterVersion into the cluster [1]

Because cluster-bootstrap does not override already-existing objects,
when the CVO's default won that race, the cluster would come up with a
bogus channel ('fast', instead of the installer's default 'stable-4.2'
etc.) and clusterID (causing the reported Telemetry to come in under a
different UUID than the one the installer provided in its
metadata.json output [2]).

By removing the default, the CVO will continue applying the current
version and reporting some metrics like
cluster_version{type="current"}.  But it will not report metrics that
depend on the state stored in the ClusterVersion object (past
versions, cluster_version{type="failure"}, etc.).  And even the
metrics it does provide may not make it up to the Telemetry API
because the monitoring operator will not be able to pass the Telemeter
container the expected clusterID [3,4].  I'm not clear on how the
Telemeter config is updated when the ClusterVersion's clusterID *does*
change [5], but the monitoring operator is willing to continue without
it [6].  The monitoring operator only updates the Telemeter Deployment
to *set* (or update) the cluster ID [7]; it never clears an existing
cluster ID.

While we won't get Telemetry out of the cluster until the
cluster-bootstrap process (or somebody else) pushes the ClusterVersion
in, we can still alert on it locally.  I haven't done that in this
commit, because [8] is in flight touching this space.

The defaulting logic dates back to the early CVOConfig work in
ea678b1 (*: read runtime parameters from CVOConfig, 2018-08-01, openshift#2).
Clayton expressed concern about recovering from ClusterVersion
deletion without the default [9], but:

* Who deletes their ClusterVersion?

* Even if someone wanted to delete their ClusterVersion, we'll want to
  eventually make the CVO an admission gate for ClusterVersion
  changes, ab4d84a (cvo: Validate the CVO prior to acting on it,
  2018-11-18, openshift#55).  So (once we set up that admission gate), someone
  determined to remove ClusterVersion would have to disable that
  admission config before they could remove the ClusterVersion.  That
  seems like enough steps that you wouldn't blow ClusterVersion away
  by accident.

* If you do successfully remove your ClusterVersion, you're going to
  lose all the status and history it contained.  If you scale down
  your CVO and then remove your ClusterVersion, it's going to be hard
  to recover that lost information.  If you left the CVO running, we
  could presumably cache the whole thing in memory and push it back
  into the cluster.  That would at least avoid the "compiled-in CVO
  defaults differ from user's choices" issue.  But we'd still be
  fighting with the user over the presence of ClusterVersion, and I'd
  rather not.

* Telemetry will continue pushing reports with the last-seen
  clusterID, so unless you disabled Telemetry first, future
  "ClusterVersion is missing" alerts (not added in this commit, see
  earlier paragraph mentioning alerts) would make it out to let us
  know that something was being silly.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1741786
[2]: https://github.com/openshift/installer/blob/4e204c5e509de1bd31113b0c0e73af1a35e52c0a/pkg/types/clustermetadata.go#L17-L18
[3]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/manifests/0000_50_cluster_monitoring_operator_04-deployment.yaml#L62
[4]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/config.go#L217-L229
[5]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L369-L370
[6]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L376
[7]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/manifests.go#L1762-L1763
[8]: openshift#232
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1708697#c12
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 17, 2019
This avoids a race between:

* The cluster-version operator pushing its internally-generated default and
* The cluster-bootstrap process pushing the installer-generated ClusterVersion into the cluster [1]

Because cluster-bootstrap does not override already-existing objects,
when the CVO's default won that race, the cluster would come up with a
bogus channel ('fast', instead of the installer's default 'stable-4.2'
etc.) and clusterID (causing the reported Telemetry to come in under a
different UUID than the one the installer provided in its
metadata.json output [2]).

By removing the default, the CVO will continue applying the current
version and reporting some metrics like
cluster_version{type="current"}.  But it will not report metrics that
depend on the state stored in the ClusterVersion object (past
versions, cluster_version{type="failure"}, etc.).  And even the
metrics it does provide may not make it up to the Telemetry API
because the monitoring operator will not be able to pass the Telemeter
container the expected clusterID [3,4].  I'm not clear on how the
Telemeter config is updated when the ClusterVersion's clusterID *does*
change [5], but the monitoring operator is willing to continue without
it [6].  The monitoring operator only updates the Telemeter Deployment
to *set* (or update) the cluster ID [7]; it never clears an existing
cluster ID.

While we won't get Telemetry out of the cluster until the
cluster-bootstrap process (or somebody else) pushes the ClusterVersion
in, we can still alert on it locally.  I haven't done that in this
commit, because [8] is in flight touching this space.

The defaulting logic dates back to the early CVOConfig work in
ea678b1 (*: read runtime parameters from CVOConfig, 2018-08-01, openshift#2).
Clayton expressed concern about recovering from ClusterVersion
deletion without the default [9], but:

* Who deletes their ClusterVersion?

* Even if someone wanted to delete their ClusterVersion, we'll want to
  eventually make the CVO an admission gate for ClusterVersion
  changes, ab4d84a (cvo: Validate the CVO prior to acting on it,
  2018-11-18, openshift#55).  So (once we set up that admission gate), someone
  determined to remove ClusterVersion would have to disable that
  admission config before they could remove the ClusterVersion.  That
  seems like enough steps that you wouldn't blow ClusterVersion away
  by accident.

* If you do successfully remove your ClusterVersion, you're going to
  lose all the status and history it contained.  If you scale down
  your CVO and then remove your ClusterVersion, it's going to be hard
  to recover that lost information.  If you left the CVO running, we
  could presumably cache the whole thing in memory and push it back
  into the cluster.  That would at least avoid the "compiled-in CVO
  defaults differ from user's choices" issue.  But we'd still be
  fighting with the user over the presence of ClusterVersion, and I'd
  rather not.

* Telemetry will continue pushing reports with the last-seen
  clusterID, so unless you disabled Telemetry first, future
  "ClusterVersion is missing" alerts (not added in this commit, see
  earlier paragraph mentioning alerts) would make it out to let us
  know that something was being silly.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1741786
[2]: https://github.com/openshift/installer/blob/4e204c5e509de1bd31113b0c0e73af1a35e52c0a/pkg/types/clustermetadata.go#L17-L18
[3]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/manifests/0000_50_cluster_monitoring_operator_04-deployment.yaml#L62
[4]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/config.go#L217-L229
[5]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L369-L370
[6]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L376
[7]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/manifests.go#L1762-L1763
[8]: openshift#232
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1708697#c12
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 17, 2019
This avoids a race between:

* The cluster-version operator pushing its internally-generated default and
* The cluster-bootstrap process pushing the installer-generated ClusterVersion into the cluster [1]

Because cluster-bootstrap does not override already-existing objects,
when the CVO's default won that race, the cluster would come up with a
bogus channel ('fast', instead of the installer's default 'stable-4.2'
etc.) and clusterID (causing the reported Telemetry to come in under a
different UUID than the one the installer provided in its
metadata.json output [2]).

By removing the default, the CVO will continue applying the current
version and reporting some metrics like
cluster_version{type="current"}.  But it will not report metrics that
depend on the state stored in the ClusterVersion object (past
versions, cluster_version{type="failure"}, etc.).  And even the
metrics it does provide may not make it up to the Telemetry API
because the monitoring operator will not be able to pass the Telemeter
container the expected clusterID [3,4].  I'm not clear on how the
Telemeter config is updated when the ClusterVersion's clusterID *does*
change [5], but the monitoring operator is willing to continue without
it [6].  The monitoring operator only updates the Telemeter Deployment
to *set* (or update) the cluster ID [7]; it never clears an existing
cluster ID.

While we won't get Telemetry out of the cluster until the
cluster-bootstrap process (or somebody else) pushes the ClusterVersion
in, we can still alert on it locally.  I haven't done that in this
commit, because [8] is in flight touching this space.

The defaulting logic dates back to the early CVOConfig work in
ea678b1 (*: read runtime parameters from CVOConfig, 2018-08-01, openshift#2).
Clayton expressed concern about recovering from ClusterVersion
deletion without the default [9], but:

* Who deletes their ClusterVersion?

* Even if someone wanted to delete their ClusterVersion, we'll want to
  eventually make the CVO an admission gate for ClusterVersion
  changes, ab4d84a (cvo: Validate the CVO prior to acting on it,
  2018-11-18, openshift#55).  So (once we set up that admission gate), someone
  determined to remove ClusterVersion would have to disable that
  admission config before they could remove the ClusterVersion.  That
  seems like enough steps that you wouldn't blow ClusterVersion away
  by accident.

* If you do successfully remove your ClusterVersion, you're going to
  lose all the status and history it contained.  If you scale down
  your CVO and then remove your ClusterVersion, it's going to be hard
  to recover that lost information.  If you left the CVO running, we
  could presumably cache the whole thing in memory and push it back
  into the cluster.  That would at least avoid the "compiled-in CVO
  defaults differ from user's choices" issue.  But we'd still be
  fighting with the user over the presence of ClusterVersion, and I'd
  rather not.

* Telemetry will continue pushing reports with the last-seen
  clusterID, so unless you disabled Telemetry first, future
  "ClusterVersion is missing" alerts (not added in this commit, see
  earlier paragraph mentioning alerts) would make it out to let us
  know that something was being silly.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1741786
[2]: https://github.com/openshift/installer/blob/4e204c5e509de1bd31113b0c0e73af1a35e52c0a/pkg/types/clustermetadata.go#L17-L18
[3]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/manifests/0000_50_cluster_monitoring_operator_04-deployment.yaml#L62
[4]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/config.go#L217-L229
[5]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L369-L370
[6]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L376
[7]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/manifests.go#L1762-L1763
[8]: openshift#232
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1708697#c12
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 18, 2019
This avoids a race between:

* The cluster-version operator pushing its internally-generated default and
* The cluster-bootstrap process pushing the installer-generated ClusterVersion into the cluster [1]

Because cluster-bootstrap does not override already-existing objects,
when the CVO's default won that race, the cluster would come up with a
bogus channel ('fast', instead of the installer's default 'stable-4.2'
etc.) and clusterID (causing the reported Telemetry to come in under a
different UUID than the one the installer provided in its
metadata.json output [2]).

By removing the default, the CVO will continue applying the current
version and reporting some metrics like
cluster_version{type="current"}.  But it will not report metrics that
depend on the state stored in the ClusterVersion object (past
versions, cluster_version{type="failure"}, etc.).  And even the
metrics it does provide may not make it up to the Telemetry API
because the monitoring operator will not be able to pass the Telemeter
container the expected clusterID [3,4].  I'm not clear on how the
Telemeter config is updated when the ClusterVersion's clusterID *does*
change [5], but the monitoring operator is willing to continue without
it [6].  The monitoring operator only updates the Telemeter Deployment
to *set* (or update) the cluster ID [7]; it never clears an existing
cluster ID.

While we won't get Telemetry out of the cluster until the
cluster-bootstrap process (or somebody else) pushes the ClusterVersion
in, we can still alert on it locally.  I haven't done that in this
commit, because [8] is in flight touching this space.

The defaulting logic dates back to the early CVOConfig work in
ea678b1 (*: read runtime parameters from CVOConfig, 2018-08-01, openshift#2).
Clayton expressed concern about recovering from ClusterVersion
deletion without the default [9], but:

* Who deletes their ClusterVersion?

* Even if someone wanted to delete their ClusterVersion, we'll want to
  eventually make the CVO an admission gate for ClusterVersion
  changes, ab4d84a (cvo: Validate the CVO prior to acting on it,
  2018-11-18, openshift#55).  So (once we set up that admission gate), someone
  determined to remove ClusterVersion would have to disable that
  admission config before they could remove the ClusterVersion.  That
  seems like enough steps that you wouldn't blow ClusterVersion away
  by accident.

* If you do successfully remove your ClusterVersion, you're going to
  lose all the status and history it contained.  If you scale down
  your CVO and then remove your ClusterVersion, it's going to be hard
  to recover that lost information.  If you left the CVO running, we
  could presumably cache the whole thing in memory and push it back
  into the cluster.  That would at least avoid the "compiled-in CVO
  defaults differ from user's choices" issue.  But we'd still be
  fighting with the user over the presence of ClusterVersion, and I'd
  rather not.

* Telemetry will continue pushing reports with the last-seen
  clusterID, so unless you disabled Telemetry first, future
  "ClusterVersion is missing" alerts (not added in this commit, see
  earlier paragraph mentioning alerts) would make it out to let us
  know that something was being silly.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1741786
[2]: https://github.com/openshift/installer/blob/4e204c5e509de1bd31113b0c0e73af1a35e52c0a/pkg/types/clustermetadata.go#L17-L18
[3]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/manifests/0000_50_cluster_monitoring_operator_04-deployment.yaml#L62
[4]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/config.go#L217-L229
[5]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L369-L370
[6]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L376
[7]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/manifests.go#L1762-L1763
[8]: openshift#232
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1708697#c12
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 18, 2019
This avoids a race between:

* The cluster-version operator pushing its internally-generated default and
* The cluster-bootstrap process pushing the installer-generated ClusterVersion into the cluster [1]

Because cluster-bootstrap does not override already-existing objects,
when the CVO's default won that race, the cluster would come up with a
bogus channel ('fast', instead of the installer's default 'stable-4.2'
etc.) and clusterID (causing the reported Telemetry to come in under a
different UUID than the one the installer provided in its
metadata.json output [2]).

By removing the default, the CVO will continue applying the current
version and reporting some metrics like
cluster_version{type="current"}.  But it will not report metrics that
depend on the state stored in the ClusterVersion object (past
versions, cluster_version{type="failure"}, etc.).  And even the
metrics it does provide may not make it up to the Telemetry API
because the monitoring operator will not be able to pass the Telemeter
container the expected clusterID [3,4].  I'm not clear on how the
Telemeter config is updated when the ClusterVersion's clusterID *does*
change [5], but the monitoring operator is willing to continue without
it [6].  The monitoring operator only updates the Telemeter Deployment
to *set* (or update) the cluster ID [7]; it never clears an existing
cluster ID.

While we won't get Telemetry out of the cluster until the
cluster-bootstrap process (or somebody else) pushes the ClusterVersion
in, we can still alert on it locally.  I haven't done that in this
commit, because [8] is in flight touching this space.

The defaulting logic dates back to the early CVOConfig work in
ea678b1 (*: read runtime parameters from CVOConfig, 2018-08-01, openshift#2).
Clayton expressed concern about recovering from ClusterVersion
deletion without the default [9], but:

* Who deletes their ClusterVersion?

* Even if someone wanted to delete their ClusterVersion, we'll want to
  eventually make the CVO an admission gate for ClusterVersion
  changes, ab4d84a (cvo: Validate the CVO prior to acting on it,
  2018-11-18, openshift#55).  So (once we set up that admission gate), someone
  determined to remove ClusterVersion would have to disable that
  admission config before they could remove the ClusterVersion.  That
  seems like enough steps that you wouldn't blow ClusterVersion away
  by accident.

* If you do successfully remove your ClusterVersion, you're going to
  lose all the status and history it contained.  If you scale down
  your CVO and then remove your ClusterVersion, it's going to be hard
  to recover that lost information.  If you left the CVO running, we
  could presumably cache the whole thing in memory and push it back
  into the cluster.  That would at least avoid the "compiled-in CVO
  defaults differ from user's choices" issue.  But we'd still be
  fighting with the user over the presence of ClusterVersion, and I'd
  rather not.

* Telemetry will continue pushing reports with the last-seen
  clusterID, so unless you disabled Telemetry first, future
  "ClusterVersion is missing" alerts (not added in this commit, see
  earlier paragraph mentioning alerts) would make it out to let us
  know that something was being silly.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1741786
[2]: https://github.com/openshift/installer/blob/4e204c5e509de1bd31113b0c0e73af1a35e52c0a/pkg/types/clustermetadata.go#L17-L18
[3]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/manifests/0000_50_cluster_monitoring_operator_04-deployment.yaml#L62
[4]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/config.go#L217-L229
[5]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L369-L370
[6]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L376
[7]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/manifests.go#L1762-L1763
[8]: openshift#232
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1708697#c12
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 21, 2019
This avoids a race between:

* The cluster-version operator pushing its internally-generated
  default and
* The cluster-bootstrap process pushing the installer-generated
  ClusterVersion into the cluster [1]

Because cluster-bootstrap does not override already-existing objects,
when the CVO's default won that race, the cluster would come up with a
bogus channel ('fast', instead of the installer's default 'stable-4.2'
etc.) and clusterID (causing the reported Telemetry to come in under a
different UUID than the one the installer provided in its
metadata.json output [2]).

By removing the default (during bootstrap), the CVO will continue
applying the current version and reporting some metrics like
cluster_version{type="current"}.  But it will not report metrics that
depend on the state stored in the ClusterVersion object (past
versions, cluster_version{type="failure"}, etc.).  And even the
metrics it does provide may not make it up to the Telemetry API
because the monitoring operator will not be able to pass the Telemeter
container the expected clusterID [3,4].  I'm not clear on how the
Telemeter config is updated when the ClusterVersion's clusterID *does*
change [5], but the monitoring operator is willing to continue without
it [6].  The monitoring operator only updates the Telemeter Deployment
to *set* (or update) the cluster ID [7]; it never clears an existing
cluster ID.

While we won't get Telemetry out of the cluster until the
cluster-bootstrap process (or somebody else) pushes the ClusterVersion
in, we can still alert on it locally.  I haven't done that in this
commit, because [8] is in flight touching this space.

The defaulting logic dates back to the early CVOConfig work in
ea678b1 (*: read runtime parameters from CVOConfig, 2018-08-01, openshift#2).
Clayton expressed concern about recovering from ClusterVersion
deletion without the default [9], but:

* Who deletes their ClusterVersion?

* Even if someone wanted to delete their ClusterVersion, we'll want to
  eventually make the CVO an admission gate for ClusterVersion
  changes, ab4d84a (cvo: Validate the CVO prior to acting on it,
  2018-11-18, openshift#55).  So (once we set up that admission gate), someone
  determined to remove ClusterVersion would have to disable that
  admission config before they could remove the ClusterVersion.  That
  seems like enough steps that you wouldn't blow ClusterVersion away
  by accident.

* If you do successfully remove your ClusterVersion, you're going to
  lose all the status and history it contained.  If you scale down
  your CVO and then remove your ClusterVersion, it's going to be hard
  to recover that lost information.  If you left the CVO running, we
  could presumably cache the whole thing in memory and push it back
  into the cluster.  That would at least avoid the "compiled-in CVO
  defaults differ from user's choices" issue.  But we'd still be
  fighting with the user over the presence of ClusterVersion, and I'd
  rather not.

* Telemetry will continue pushing reports with the last-seen
  clusterID, so unless you disabled Telemetry first, future
  "ClusterVersion is missing" alerts (not added in this commit, see
  earlier paragraph mentioning alerts) would make it out to let us
  know that something was being silly.

* We're only removing defaulting from the bootstrap CVO.  Once we get
  on to the production CVO, we'll still have our previous
  default-ClusterVersion-injection behavior.

An alternative approach we considered was passing a default cluster ID
in via a CVO command line option [10].  But Abhinav wants bootstrap
operators to be loading their config from on-disk manifests as much as
possible [11], and this commit effectively ensures we load
ClusterVersion from the installer-provided manifest during
bootstrapping.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1741786
[2]: https://github.com/openshift/installer/blob/4e204c5e509de1bd31113b0c0e73af1a35e52c0a/pkg/types/clustermetadata.go#L17-L18
[3]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/manifests/0000_50_cluster_monitoring_operator_04-deployment.yaml#L62
[4]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/config.go#L217-L229
[5]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L369-L370
[6]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L376
[7]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/manifests.go#L1762-L1763
[8]: openshift#232
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1708697#c12
[10]: openshift#239
[11]: openshift#239 (comment)
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 21, 2019
This avoids a race between:

* The cluster-version operator pushing its internally-generated
  default and
* The cluster-bootstrap process pushing the installer-generated
  ClusterVersion into the cluster [1]

Because cluster-bootstrap does not override already-existing objects,
when the CVO's default won that race, the cluster would come up with a
bogus channel ('fast', instead of the installer's default 'stable-4.2'
etc.) and clusterID (causing the reported Telemetry to come in under a
different UUID than the one the installer provided in its
metadata.json output [2]).

By removing the default (during bootstrap), the CVO will continue
applying the current version and reporting some metrics like
cluster_version{type="current"}.  But it will not report metrics that
depend on the state stored in the ClusterVersion object (past
versions, cluster_version{type="failure"}, etc.).  And even the
metrics it does provide may not make it up to the Telemetry API
because the monitoring operator will not be able to pass the Telemeter
container the expected clusterID [3,4].  I'm not clear on how the
Telemeter config is updated when the ClusterVersion's clusterID *does*
change [5], but the monitoring operator is willing to continue without
it [6].  The monitoring operator only updates the Telemeter Deployment
to *set* (or update) the cluster ID [7]; it never clears an existing
cluster ID.

While we won't get Telemetry out of the cluster until the
cluster-bootstrap process (or somebody else) pushes the ClusterVersion
in, we can still alert on it locally.  I haven't done that in this
commit, because [8] is in flight touching this space.

The defaulting logic dates back to the early CVOConfig work in
ea678b1 (*: read runtime parameters from CVOConfig, 2018-08-01, openshift#2).
Clayton expressed concern about recovering from ClusterVersion
deletion without the default [9], but:

* Who deletes their ClusterVersion?

* Even if someone wanted to delete their ClusterVersion, we'll want to
  eventually make the CVO an admission gate for ClusterVersion
  changes, ab4d84a (cvo: Validate the CVO prior to acting on it,
  2018-11-18, openshift#55).  So (once we set up that admission gate), someone
  determined to remove ClusterVersion would have to disable that
  admission config before they could remove the ClusterVersion.  That
  seems like enough steps that you wouldn't blow ClusterVersion away
  by accident.

* If you do successfully remove your ClusterVersion, you're going to
  lose all the status and history it contained.  If you scale down
  your CVO and then remove your ClusterVersion, it's going to be hard
  to recover that lost information.  If you left the CVO running, we
  could presumably cache the whole thing in memory and push it back
  into the cluster.  That would at least avoid the "compiled-in CVO
  defaults differ from user's choices" issue.  But we'd still be
  fighting with the user over the presence of ClusterVersion, and I'd
  rather not.

* Telemetry will continue pushing reports with the last-seen
  clusterID, so unless you disabled Telemetry first, future
  "ClusterVersion is missing" alerts (not added in this commit, see
  earlier paragraph mentioning alerts) would make it out to let us
  know that something was being silly.

* We're only removing defaulting from the bootstrap CVO.  Once we get
  on to the production CVO, we'll still have our previous
  default-ClusterVersion-injection behavior.

An alternative approach we considered was passing a default cluster ID
in via a CVO command line option [10].  But Abhinav wants bootstrap
operators to be loading their config from on-disk manifests as much as
possible [11], and this commit effectively ensures we load
ClusterVersion from the installer-provided manifest during
bootstrapping.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1741786
[2]: https://github.com/openshift/installer/blob/4e204c5e509de1bd31113b0c0e73af1a35e52c0a/pkg/types/clustermetadata.go#L17-L18
[3]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/manifests/0000_50_cluster_monitoring_operator_04-deployment.yaml#L62
[4]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/config.go#L217-L229
[5]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L369-L370
[6]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L376
[7]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/manifests.go#L1762-L1763
[8]: openshift#232
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1708697#c12
[10]: openshift#239
[11]: openshift#239 (comment)
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Aug 21, 2019
This avoids a race between:

* The cluster-version operator pushing its internally-generated
  default and
* The cluster-bootstrap process pushing the installer-generated
  ClusterVersion into the cluster [1]

Because cluster-bootstrap does not override already-existing objects,
when the CVO's default won that race, the cluster would come up with a
bogus channel ('fast', instead of the installer's default 'stable-4.2'
etc.) and clusterID (causing the reported Telemetry to come in under a
different UUID than the one the installer provided in its
metadata.json output [2]).

By removing the default (during bootstrap), the CVO will continue
applying the current version and reporting some metrics like
cluster_version{type="current"}.  But it will not report metrics that
depend on the state stored in the ClusterVersion object (past
versions, cluster_version{type="failure"}, etc.).  And even the
metrics it does provide may not make it up to the Telemetry API
because the monitoring operator will not be able to pass the Telemeter
container the expected clusterID [3,4].  I'm not clear on how the
Telemeter config is updated when the ClusterVersion's clusterID *does*
change [5], but the monitoring operator is willing to continue without
it [6].  The monitoring operator only updates the Telemeter Deployment
to *set* (or update) the cluster ID [7]; it never clears an existing
cluster ID.

While we won't get Telemetry out of the cluster until the
cluster-bootstrap process (or somebody else) pushes the ClusterVersion
in, we can still alert on it locally.  I haven't done that in this
commit, because [8] is in flight touching this space.

The defaulting logic dates back to the early CVOConfig work in
ea678b1 (*: read runtime parameters from CVOConfig, 2018-08-01, openshift#2).
Clayton expressed concern about recovering from ClusterVersion
deletion without the default [9], but:

* Who deletes their ClusterVersion?

* Even if someone wanted to delete their ClusterVersion, we'll want to
  eventually make the CVO an admission gate for ClusterVersion
  changes, ab4d84a (cvo: Validate the CVO prior to acting on it,
  2018-11-18, openshift#55).  So (once we set up that admission gate), someone
  determined to remove ClusterVersion would have to disable that
  admission config before they could remove the ClusterVersion.  That
  seems like enough steps that you wouldn't blow ClusterVersion away
  by accident.

* If you do successfully remove your ClusterVersion, you're going to
  lose all the status and history it contained.  If you scale down
  your CVO and then remove your ClusterVersion, it's going to be hard
  to recover that lost information.  If you left the CVO running, we
  could presumably cache the whole thing in memory and push it back
  into the cluster.  That would at least avoid the "compiled-in CVO
  defaults differ from user's choices" issue.  But we'd still be
  fighting with the user over the presence of ClusterVersion, and I'd
  rather not.

* Telemetry will continue pushing reports with the last-seen
  clusterID, so unless you disabled Telemetry first, future
  "ClusterVersion is missing" alerts (not added in this commit, see
  earlier paragraph mentioning alerts) would make it out to let us
  know that something was being silly.

* We're only removing defaulting from the bootstrap CVO.  Once we get
  on to the production CVO, we'll still have our previous
  default-ClusterVersion-injection behavior.

An alternative approach we considered was passing a default cluster ID
in via a CVO command line option [10].  But Abhinav wants bootstrap
operators to be loading their config from on-disk manifests as much as
possible [11], and this commit effectively ensures we load
ClusterVersion from the installer-provided manifest during
bootstrapping.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1741786
[2]: https://github.com/openshift/installer/blob/4e204c5e509de1bd31113b0c0e73af1a35e52c0a/pkg/types/clustermetadata.go#L17-L18
[3]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/manifests/0000_50_cluster_monitoring_operator_04-deployment.yaml#L62
[4]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/config.go#L217-L229
[5]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L369-L370
[6]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/operator/operator.go#L376
[7]: https://github.com/openshift/cluster-monitoring-operator/blob/14b1093149217a6ab5e7603f19ff5449f1ec12fc/pkg/manifests/manifests.go#L1762-L1763
[8]: openshift#232
[9]: https://bugzilla.redhat.com/show_bug.cgi?id=1708697#c12
[10]: openshift#239
[11]: openshift#239 (comment)
wking added a commit to wking/cluster-version-operator that referenced this pull request Nov 9, 2019
We want to be able to distinguish these conditions, which can be due
to internal misconfiguration or external Cincinnati/network errors
[1].  The former can be fixed by cluster admins.  The latter could go
either way.

I dropped the len(upstream) guard from checkForUpdate because there's
already an earlier guard in syncAvailableUpdates.  The guard I'm
removing is from db150e6 (cvo: Perform status updates in a single
thread, 2018-11-03, openshift#45).  The covering guard is from the later
286641d (api: Update to objects from openshift/api, 2018-11-15, openshift#55).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1685338
wking added a commit to wking/cluster-version-operator that referenced this pull request Nov 12, 2019
We want to be able to distinguish these conditions, which can be due
to internal misconfiguration or external Cincinnati/network errors
[1].  The former can be fixed by cluster admins.  The latter could go
either way.

I dropped the len(upstream) guard from checkForUpdate because there's
already an earlier guard in syncAvailableUpdates.  The guard I'm
removing is from db150e6 (cvo: Perform status updates in a single
thread, 2018-11-03, openshift#45).  The covering guard is from the later
286641d (api: Update to objects from openshift/api, 2018-11-15, openshift#55).

Personally, I'd rather have GetUpdates return an *Error, so we could
dispense with the cast and unused Unknown-reason fallback.  But
Abhinav wanted the explicit cast in return for a more familiar error
type [2].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1685338
[2]: openshift#268 (comment)
wking added a commit to wking/cluster-version-operator that referenced this pull request Nov 13, 2019
We want to be able to distinguish these conditions, which can be due
to internal misconfiguration or external Cincinnati/network errors
[1].  The former can be fixed by cluster admins.  The latter could go
either way.

I dropped the len(upstream) guard from checkForUpdate because there's
already an earlier guard in syncAvailableUpdates.  The guard I'm
removing is from db150e6 (cvo: Perform status updates in a single
thread, 2018-11-03, openshift#45).  The covering guard is from the later
286641d (api: Update to objects from openshift/api, 2018-11-15, openshift#55).

Personally, I'd rather have GetUpdates return an *Error, so we could
dispense with the cast and unused Unknown-reason fallback.  But
Abhinav wanted the explicit cast in return for a more familiar error
type [2].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1685338
[2]: openshift#268 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants