-
Notifications
You must be signed in to change notification settings - Fork 213
pkg/cvo/internal/operatorstatus: Drop deprecated failing/progressing handling #527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/cvo/internal/operatorstatus: Drop deprecated failing/progressing handling #527
Conversation
911e462 to
92b76cb
Compare
|
This looks okay to me and unit tests should catch any detail I missed. I know you need to change 2 test cases from using |
92b76cb to
5de0be4
Compare
7f7ebe0 to
ce44eb9
Compare
cc7346b to
4abd57c
Compare
…handling Also simplify the 'undone' type from a map to a slice of strings. The only consumers are length checks and nestedMessage formatting, so we don't need to preserve structured access to the current and expected values. I've also dropped the old, early "Cluster operator %s is still updating" guard. If the ClusterOperator is degraded or unavailable, those are much more serious than being behind in version, so we want those guards coming first. The undone guard is now only the final fallback for available, non-degraded operators that aren't completely satisfactory (and I've pivoted to use ClusterOperatorUpdating for that reason). I'm also now keeping any available condition data, and including that status, reason, and message in the nested message, because it seemed strange to do that for degraded, but not for the more serious unavailable. And finally, I've dropped the old mode switch, and instead distributed the mode check to each relevant guard. The "we're completely satisfied" case is now where you end up if none of the "we aren't happy" guards trip. This avoids the need to have an "if we got this far, it must have been because we have undone versions" assumption for the ClusterOperatorUpdating block.
Fixing a TODO from 02a389b (sync: Temporarily stop checking version in ClusterOperator.Status, 2019-01-23, openshift#98). The semantics of the 'operator' entry have been clear since at least openshift/api@40c55f8085 (config/v1/types_cluster_operator: ClusterOperatorStatus doc wordsmithing, 2019-10-28, openshift/api#501).
To make it easier to debug when the difference is in a deeper pointer,
e.g. [1]:
--- FAIL: Test_waitForOperatorStatusToBeDone (0.02s)
--- FAIL: Test_waitForOperatorStatusToBeDone/cluster_operator_reporting_available=true_and_degraded=false,_but_no_versions (0.00s)
operatorstatus_test.go:533: Incorrect value returned -
expected: &payload.UpdateError{Nested:(*errors.errorString)(0xc00060ae00), UpdateEffect:"None", Reason:"ClusterOperatorUpdating", Message:"Cluster operator test-co is updating versions", Name:"test-co", Task:(*payload.Task)(nil)}
returned: &payload.UpdateError{Nested:(*errors.errorString)(0xc000238740), UpdateEffect:"None", Reason:"ClusterOperatorUpdating", Message:"Cluster operator test-co is updating versions", Name:"test-co", Task:(*payload.Task)(nil)}
[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/527/pull-ci-openshift-cluster-version-operator-master-unit/1374504389896572928#1:build-log.txt%3A41
4abd57c to
98794a6
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jottofar, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
I am not comfortable with parts of this change - specifically allowing degraded during initialization and allowing us to exit install in a "ok" state with operators degraded (worse, we report available, then go failed after, when we should install, immediately report "Failed" and "Available" at the same time, and then cause installer to exit as "install failed"). I see no justification in the comment for such a change (the commit message is about code changes, not about the intent change). If you can link to such discussion I'd appreciate it, but this doesn't pass my smell test which is that degraded during install may be suppressible, but that reporting available (for automation) and then immediately going degraded punts problems downstream that should be caught right up front. An install completes if it is degraded, but it should not pretend that it is ok. EDIT: discussion evolved in slack after this comment, the bug https://bugzilla.redhat.com/show_bug.cgi?id=1951835 captures that evolution re: impact of inertia, significantly delayed Failing condition post install, and how initilaizing and reconciling mode transition is important to capture correctly. |
Also simplify the
undonetype from a map to a slice of strings. The only consumers are length checks andnestedMessageformatting, so we don't need to preserve structured access to the current and expected values.I've also dropped the old, early "Cluster operator %s is still updating" guard. If the ClusterOperator is degraded or unavailable, those are much more serious than being behind in version, so we want those guards coming first. The undone guard is now only the final fallback for available, non-degraded operators that aren't completely satisfactory (and I've pivoted to use
ClusterOperatorUpdatingfor that reason).I'm also now keeping any available condition data, and including that status, reason, and message in the nested message, because it seemed strange to do that for degraded, but not for the more serious unavailable.
And finally, I've dropped the old mode switch, and instead distributed the mode check to each relevant guard. The "we're completely satisfied" case is now where you end up if none of the "we aren't happy" guards trip. This avoids the need to have an "if we got this far, it must have been because we have undone versions" assumption for the
ClusterOperatorUpdatingblock.