Skip to content

Conversation

@jottofar
Copy link
Contributor

@jottofar jottofar commented Feb 4, 2021

to a different message for cluster operators that are available, not degraded, but not yet finished updating

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2021
@jottofar
Copy link
Contributor Author

jottofar commented Feb 4, 2021

/test unit

@jottofar
Copy link
Contributor Author

jottofar commented Feb 8, 2021

/retest

1 similar comment
@jottofar
Copy link
Contributor Author

/retest

@jottofar
Copy link
Contributor Author

/retest

@jottofar
Copy link
Contributor Author

/test e2e-agnostic

1 similar comment
@jottofar
Copy link
Contributor Author

/test e2e-agnostic

@wking
Copy link
Member

wking commented Feb 24, 2021

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 24, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

11 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@jottofar
Copy link
Contributor Author

/test e2e-agnostic-upgrade

@wking
Copy link
Member

wking commented Mar 17, 2021

Most recent update job failed:

  • pods should never transition back to pending, which is not us (it's being worked in rhbz#1933760).

  • cluster upgrade should be fast, Upgrade took too long: 125.8349849658, which is from finish splitting each part of the upgrade into distinct junit origin#25417. But:

    $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/514/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade/1371802866682957824/artifacts/e2e-agnostic-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + .completionTime + " " + .state + " " + .version'
    2021-03-16T14:40:36Z 2021-03-16T15:44:25Z Completed 4.8.0-0.ci.test-2021-03-16-123757-ci-op-y8j8drkp
    2021-03-16T13:38:31Z 2021-03-16T14:40:36Z Partial 4.8.0-0.ci.test-2021-03-16-124133-ci-op-y8j8drkp
    2021-03-16T13:05:46Z 2021-03-16T13:33:50Z Completed 4.8.0-0.ci.test-2021-03-16-123757-ci-op-y8j8drkp

So a ~33 minute hop and an ~1h2m hop. That's under both too-long caps, so must be a bug in their hop-detection logic. To keep the CVO moving while we work on fixing that test:

/override ci/prow/e2e-agnostic-upgrade

@openshift-ci-robot
Copy link
Contributor

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade

Details

In response to this:

Most recent update job failed:

  • pods should never transition back to pending, which is not us (it's being worked in rhbz#1933760).

  • cluster upgrade should be fast, Upgrade took too long: 125.8349849658, which is from finish splitting each part of the upgrade into distinct junit origin#25417. But:

    $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/514/pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade/1371802866682957824/artifacts/e2e-agnostic-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + .completionTime + " " + .state + " " + .version'
    2021-03-16T14:40:36Z 2021-03-16T15:44:25Z Completed 4.8.0-0.ci.test-2021-03-16-123757-ci-op-y8j8drkp
    2021-03-16T13:38:31Z 2021-03-16T14:40:36Z Partial 4.8.0-0.ci.test-2021-03-16-124133-ci-op-y8j8drkp
    2021-03-16T13:05:46Z 2021-03-16T13:33:50Z Completed 4.8.0-0.ci.test-2021-03-16-123757-ci-op-y8j8drkp

So a ~33 minute hop and an ~1h2m hop. That's under both too-long caps, so must be a bug in their hop-detection logic. To keep the CVO moving while we work on fixing that test:

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking added a commit to wking/origin that referenced this pull request Mar 17, 2021
durationToSoftFailure was added in 4447a19 (allow longer upgrade
times to run tests, but continue to fail at 75 minutes, 2020-08-12, openshift#25411),
but didn't get the 2x on rollbacks we'e been adding to maximumDuration
since a53efd5 (Support --options on upgrade tests to abort in
progress, 2019-04-29, openshift#22726).  That's recently been causing the
cluster-version operator's A->B->A rollback CI jobs to time out [1].
This commit catches durationToSoftFailure up with the "2x on
rollbacks" approach, and also mentions "aborted" in messages for those
types of tests, to help remind folks what's going on.

An alternative approach would be to teach clusterUpgrade to treat
rollbacks as two separate hops (one for A->B, and another for B->A).
But that would be a more involved restructuring, and since we already
had the 2x maximumDuration precedent in place, I haven't gone in that
direction.

[1]: openshift/cluster-version-operator#514 (comment)
@openshift-merge-robot openshift-merge-robot merged commit b15b12e into openshift:master Mar 17, 2021
wking added a commit to wking/cluster-version-operator that referenced this pull request May 26, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 27, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 27, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 27, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 27, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 27, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 27, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 27, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 27, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 27, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 27, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 27, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request Oct 26, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request Oct 26, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request Oct 29, 2021
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
DavidHurta pushed a commit to DavidHurta/origin that referenced this pull request Mar 2, 2022
durationToSoftFailure was added in 4447a19 (allow longer upgrade
times to run tests, but continue to fail at 75 minutes, 2020-08-12, openshift#25411),
but didn't get the 2x on rollbacks we'e been adding to maximumDuration
since a53efd5 (Support --options on upgrade tests to abort in
progress, 2019-04-29, openshift#22726).  That's recently been causing the
cluster-version operator's A->B->A rollback CI jobs to time out [1].
This commit catches durationToSoftFailure up with the "2x on
rollbacks" approach, and also mentions "aborted" in messages for those
types of tests, to help remind folks what's going on.

An alternative approach would be to teach clusterUpgrade to treat
rollbacks as two separate hops (one for A->B, and another for B->A).
But that would be a more involved restructuring, and since we already
had the 2x maximumDuration precedent in place, I haven't gone in that
direction.

[1]: openshift/cluster-version-operator#514 (comment)
DavidHurta pushed a commit to DavidHurta/origin that referenced this pull request Mar 3, 2022
durationToSoftFailure was added in 4447a19 (allow longer upgrade
times to run tests, but continue to fail at 75 minutes, 2020-08-12, openshift#25411),
but didn't get the 2x on rollbacks we'e been adding to maximumDuration
since a53efd5 (Support --options on upgrade tests to abort in
progress, 2019-04-29, openshift#22726).  That's recently been causing the
cluster-version operator's A->B->A rollback CI jobs to time out [1].
This commit catches durationToSoftFailure up with the "2x on
rollbacks" approach, and also mentions "aborted" in messages for those
types of tests, to help remind folks what's going on.

An alternative approach would be to teach clusterUpgrade to treat
rollbacks as two separate hops (one for A->B, and another for B->A).
But that would be a more involved restructuring, and since we already
had the 2x maximumDuration precedent in place, I haven't gone in that
direction.

[1]: openshift/cluster-version-operator#514 (comment)
DavidHurta pushed a commit to DavidHurta/origin that referenced this pull request Mar 4, 2022
durationToSoftFailure was added in 4447a19 (allow longer upgrade
times to run tests, but continue to fail at 75 minutes, 2020-08-12, openshift#25411),
but didn't get the 2x on rollbacks we'e been adding to maximumDuration
since a53efd5 (Support --options on upgrade tests to abort in
progress, 2019-04-29, openshift#22726).  That's recently been causing the
cluster-version operator's A->B->A rollback CI jobs to time out [1].
This commit catches durationToSoftFailure up with the "2x on
rollbacks" approach, and also mentions "aborted" in messages for those
types of tests, to help remind folks what's going on.

An alternative approach would be to teach clusterUpgrade to treat
rollbacks as two separate hops (one for A->B, and another for B->A).
But that would be a more involved restructuring, and since we already
had the 2x maximumDuration precedent in place, I haven't gone in that
direction.

[1]: openshift/cluster-version-operator#514 (comment)
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 2, 2022
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 2, 2022
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 2, 2022
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 3, 2022
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the
operators that have not yet deployed, 2019-04-09, openshift#158).  And the
not-available filtering is from bdd4545 (status: Hide generic
operator status in favor of more specific errors, 2019-05-19, openshift#192).
But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested
message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from
ClusterOperatorNotAvailable to ClusterOperatorUpdating.  And we want
to avoid uncollapsed errors like:

  Multiple errors are preventing progress:
  * Cluster operator machine-api is updating versions
  * Cluster operator openshift-apiserver is updating versions

where we are waiting on multiple ClusterOperator which are in similar
situations.  This commit drops the filtering, because cluster
operators are important.  It does sort those errors to the end of the
list though, so the first error is the non-ClusterOperator error.

TestCVO_ParallelError no longer tests the consolidated error message,
because the consolidation is now restricted to ClusterOperator
resources.  I tried moving the
pkg/cvo/testdata/paralleltest/release-manifests manifests to
ClusterOperator, but then the test struggled with:

  I0802 16:04:18.133935    2005 sync_worker.go:945] Unable to precreate resource clusteroperator

so now TestCVO_ParallelError is excercising the fact that
non-ClusterOperator failures are not aggregated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants