-
Notifications
You must be signed in to change notification settings - Fork 213
Bug 2071998: pkg/cvo/updatepayload: Event when forcing through a sig-verification failure #763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2071998: pkg/cvo/updatepayload: Event when forcing through a sig-verification failure #763
Conversation
|
@wking: This pull request references Bugzilla bug 2071998, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
0963953 to
4f880e5
Compare
LalatenduMohanty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/retest |
4f880e5 to
ace2339
Compare
ace2339 to
cd35f32
Compare
|
Cluster-bot We'll need a more-specific error out of the verification library, if we're trying to debug slow signature searche. /hold |
| } | ||
| if info.VerificationError != nil { | ||
| w.eventRecorder.Eventf(cvoObjectRef, corev1.EventTypeWarning, "RetrievePayload", info.VerificationError.Error()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking Any reason you are not calling reporter.ReportPayload after the w.eventRecorder.Eventf
reporter.ReportPayload(LoadPayloadStatus{
Step: "RetrievePayload",
Message: msg,
Release: desired,
LastTransitionTime: time.Now(),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we just want this event for the narrow "update was forced, release was not verified" case. I didn't think it was worth tying into ClusterVersion reporting (we will report there when we accept the update, which because of force, we will soon after this event)
cd35f32 to
f05c290
Compare
|
Rebased to pick up openshift/library-go#1358 with cd35f32 -> f05c290. |
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1517255913856045056/artifacts/launch/gather-extra/artifacts/events.json | jq -r '[.items[] | select(.metadata.namespace == "openshift-cluster-version")] | sort_by(.metadata.creationTimestamp)[] | .metadata.creationTimestamp + " " + .reason + ": " + .message' | grep -1 keyring
2022-04-21T22:15:03Z Completed: Job completed
2022-04-21T22:15:04Z RetrievePayload: Target release version="" image="registry.build01.ci.openshift.org/ci-ln-nt5fbdk/release@sha256:88707bcc9a53ea4eb963b98aee830c1887d5019a664e879b599e030a3bdb142b" cannot be verified, but continuing anyway because the update was forced: unable to verify sha256:88707bcc9a53ea4eb963b98aee830c1887d5019a664e879b599e030a3bdb142b against keyrings: verifier-public-key-redhat
2022-04-21T22:15:04Z LoadPayload: Loading payload version="" image="registry.build01.ci.openshift.org/ci-ln-nt5fbdk/release@sha256:88707bcc9a53ea4eb963b98aee830c1887d5019a664e879b599e030a3bdb142b"Hrm. So because the event is up in |
Picking up openshift/library-go@1b9753d298 (Bug 2071998: pkg/verify: Expose underlying signature errors, 2022-04-21, openshift/library-go#1358) and openshift/library-go#1371. Generated with: $ go get -u github.com/openshift/library-go $ go mod tidy $ go mod vendor $ git add -A go.* vendor using: $ go version go version go1.17.3 linux/amd64
f05c290 to
194e690
Compare
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1528827908339011584/artifacts/launch/gather-extra/artifacts/events.json | jq -r '[.items[] | select(.metadata.namespace == "openshift-cluster-version")] | sort_by(.metadata.creationTimestamp)[] | .metadata.creationTimestamp + " " + .reason + ": " + .message' | grep -1 keyring
2022-05-23T20:41:39Z Completed: Job completed
2022-05-23T20:41:41Z RetrievePayload: Target release version="" image="registry.build01.ci.openshift.org/ci-ln-d3qhbgb/release@sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76" cannot be verified, but continuing anyway because the update was forced: unable to verify sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76 against keyrings: verifier-public-key-redhat
2022-05-23T20:41:41Z LoadPayload: Loading payload version="" image="registry.build01.ci.openshift.org/ci-ln-d3qhbgb/release@sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76"Grr. Maybe my |
194e690 to
71762af
Compare
|
unit: feels unrelated, but see how reproducible it is: /test unit |
…failure For unforced updates, when signature, etc. verification fails, RetrievePayload returns an error, and we have emitted events for RetrievePayload errors since 475e71f (emit events for each new payload, 2020-07-21, openshift#411). However, when the user forces the update, we log but do not error out on verification failures. With this commit, we will also emit a warning event with an error message, which will make it easier to understand how signature verification failed, even when we don't have access to the logs of the outgoing cluster-version operator [1]. We should transition away from the maintenance-mode github.com/pkg/errors [2], but I'm deferring that to follow-up work. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2071998#c2 [2]: https://pkg.go.dev/github.com/pkg/errors#readme-roadmap
71762af to
5ad3c14
Compare
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1528972481522569216/artifacts/launch/gather-extra/artifacts/events.json | jq -r '[.items[] | select(.metadata.namespace == "openshift-cluster-version")] | sort_by(.metadata.creationTimestamp)[] | .metadata.creationTimestamp + " " + .reason + ": " + .message' | grep -A2 keyring
Target release version="" image="registry.build01.ci.openshift.org/ci-ln-b7pvh2t/release@sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76" cannot be verified, but continuing anyway because the update was forced: unable to verify sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76 against keyrings: verifier-public-key-redhat
[2022-05-24T06:24:57Z: prefix sha256-83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76 in config map signatures-managed: no more signatures to check, 2022-05-24T06:24:57Z: unable to retrieve signature from https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76/signature-1: no more signatures to check, 2022-05-24T06:24:57Z: unable to retrieve signature from https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76/signature-1: no more signatures to check, 2022-05-24T06:24:57Z: parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check, 2022-05-24T06:24:57Z: serial signature store wrapping config maps in openshift-config-managed with label "release.openshift.io/verification-signatures", parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check]
2022-05-24T06:25:12Z LoadPayload: Loading payload version="" image="registry.build01.ci.openshift.org/ci-ln-b7pvh2t/release@sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76"So 5ad3c14's
Looks good to me. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jottofar, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel Error message is more specific now. |
|
/override ci/prow/e2e-agnostic-upgrade |
|
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: All pull requests linked via external trackers have merged: Bugzilla bug 2071998 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
…s too We've including verification-failure details for forced updates since 5ad3c14 (pkg/cvo/updatepayload: Event when forcing through a sig-verification failure, 2022-04-07, 2022, openshift#763), but had not been including them in logs or other output in the "we aren't forcing, so this blocks the update's acceptance" case. This commit adds the detail to the Event, so it's available, but keeps only the high-level message in the RetrievePayload status output (which feeds the ReleaseAccepted condition in ClusterVersion), because while the low-level are useful for debugging, they're pretty chatty for condition consumers that are more interested in just knowing basically why the update request isn't being accepted.
…s too We've including verification-failure details for forced updates since 5ad3c14 (pkg/cvo/updatepayload: Event when forcing through a sig-verification failure, 2022-04-07, 2022, openshift#763), but had not been including them in logs or other output in the "we aren't forcing, so this blocks the update's acceptance" case. This commit adds the detail to the Event, so it's available, but keeps only the high-level message in the RetrievePayload status output (which feeds the ReleaseAccepted condition in ClusterVersion), because while the low-level are useful for debugging, they're pretty chatty for condition consumers that are more interested in just knowing basically why the update request isn't being accepted.
…s too We've including verification-failure details for forced updates since 5ad3c14 (pkg/cvo/updatepayload: Event when forcing through a sig-verification failure, 2022-04-07, 2022, openshift#763), but had not been including them in logs or other output in the "we aren't forcing, so this blocks the update's acceptance" case. This commit adds the detail to the Event, so it's available, but keeps only the high-level message in the RetrievePayload status output (which feeds the ReleaseAccepted condition in ClusterVersion), because while the low-level are useful for debugging, they're pretty chatty for condition consumers that are more interested in just knowing basically why the update request isn't being accepted. The newline-to-// replacement is because apparently Event messages truncate at the first newline. I have not tracked down docs or source to back that up, but confirmed it in pre-merge testing [1]. [1]: openshift#1003 (comment)
…s too We've including verification-failure details for forced updates since 5ad3c14 (pkg/cvo/updatepayload: Event when forcing through a sig-verification failure, 2022-04-07, 2022, openshift#763), but had not been including them in logs or other output in the "we aren't forcing, so this blocks the update's acceptance" case. This commit adds the detail to the Event, so it's available, but keeps only the high-level message in the RetrievePayload status output (which feeds the ReleaseAccepted condition in ClusterVersion), because while the low-level are useful for debugging, they're pretty chatty for condition consumers that are more interested in just knowing basically why the update request isn't being accepted. The newline-to-// replacement is because apparently Event messages truncate at the first newline. I have not tracked down docs or source to back that up, but confirmed it in pre-merge testing [1]. [1]: openshift#1003 (comment)
For unforced updates, when signature, etc. verification fails,
RetrievePayloadreturns an error, and we have emitted events forRetrievePayloaderrors since 475e71f (#411). However, when the user forces the update, we log but do not error out on verification failures. With this commit, we will also emit a warning event with an error message, which will make it easier to understand how signature verification failed, even when we don't have access to the logs of the outgoing cluster-version operator.