Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Apr 7, 2022

For unforced updates, when signature, etc. verification fails, RetrievePayload returns an error, and we have emitted events for RetrievePayload errors since 475e71f (#411). However, when the user forces the update, we log but do not error out on verification failures. With this commit, we will also emit a warning event with an error message, which will make it easier to understand how signature verification failed, even when we don't have access to the logs of the outgoing cluster-version operator.

@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Apr 7, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 7, 2022

@wking: This pull request references Bugzilla bug 2071998, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.11.0) matches configured target release for branch (4.11.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

Bug 2071998: pkg/cvo/updatepayload: Event when forcing through a sig-verification failure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from jiajliu, jottofar and vrutkovs April 7, 2022 16:39
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 7, 2022
@wking wking force-pushed the event-on-forced-signature-failures branch from 0963953 to 4f880e5 Compare April 7, 2022 16:55
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 7, 2022
@LalatenduMohanty
Copy link
Member

/retest

@wking wking force-pushed the event-on-forced-signature-failures branch from 4f880e5 to ace2339 Compare April 7, 2022 17:27
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 7, 2022
@wking wking force-pushed the event-on-forced-signature-failures branch from ace2339 to cd35f32 Compare April 7, 2022 21:48
@wking
Copy link
Member Author

wking commented Apr 8, 2022

Cluster-bot test upgrade 4.11.0-0.ci-2022-04-08-015619,openshift/cluster-version-operator#763 4.11.0-0.ci-2022-04-08-015619:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1512273235499028480/artifacts/launch/gather-extra/artifacts/events.json | jq -r '.items[] | select(.reason == "RetrievePayload") | .metadata.creationTimestamp + " " + .reason + ": " + .message'
...
2022-04-08T04:15:11Z RetrievePayload: Target release version="" image="registry.build01.ci.openshift.org/ci-ln-j2gy3v2/release@sha256:41b7d77d042ad3c4e62d372ea70f6a2b9809e561538fa54b553b93ac5c0c3308" cannot be verified, but continuing anyway because the update was forced: unable to locate a valid signature for one or more sources

We'll need a more-specific error out of the verification library, if we're trying to debug slow signature searche.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 8, 2022
}
if info.VerificationError != nil {
w.eventRecorder.Eventf(cvoObjectRef, corev1.EventTypeWarning, "RetrievePayload", info.VerificationError.Error())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking Any reason you are not calling reporter.ReportPayload after the w.eventRecorder.Eventf

reporter.ReportPayload(LoadPayloadStatus{
			Step:               "RetrievePayload",
			Message:            msg,
			Release:            desired,
			LastTransitionTime: time.Now(),

Copy link
Member Author

@wking wking Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we just want this event for the narrow "update was forced, release was not verified" case. I didn't think it was worth tying into ClusterVersion reporting (we will report there when we accept the update, which because of force, we will soon after this event)

@wking
Copy link
Member Author

wking commented Apr 21, 2022

Rebased to pick up openshift/library-go#1358 with cd35f32 -> f05c290.

@wking
Copy link
Member Author

wking commented Apr 21, 2022

test upgrade 4.11.0-0.ci-2022-04-21-074759,openshift/cluster-version-operator#763 4.11.0-0.ci-2022-04-21-074759:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1517255913856045056/artifacts/launch/gather-extra/artifacts/events.json | jq -r '[.items[] | select(.metadata.namespace == "openshift-cluster-version")] | sort_by(.metadata.creationTimestamp)[] | .metadata.creationTimestamp + " " + .reason + ": " + .message' | grep -1 keyring
2022-04-21T22:15:03Z Completed: Job completed
2022-04-21T22:15:04Z RetrievePayload: Target release version="" image="registry.build01.ci.openshift.org/ci-ln-nt5fbdk/release@sha256:88707bcc9a53ea4eb963b98aee830c1887d5019a664e879b599e030a3bdb142b" cannot be verified, but continuing anyway because the update was forced: unable to verify sha256:88707bcc9a53ea4eb963b98aee830c1887d5019a664e879b599e030a3bdb142b against keyrings: verifier-public-key-redhat
2022-04-21T22:15:04Z LoadPayload: Loading payload version="" image="registry.build01.ci.openshift.org/ci-ln-nt5fbdk/release@sha256:88707bcc9a53ea4eb963b98aee830c1887d5019a664e879b599e030a3bdb142b"

Hrm. So because the event is up in SyncWorker.syncPayload, while both verification and the signature check are down in payloadRetriever.RetrievePayload, we aren't getting all that much detail on either the timing or the failure mode of the signature check...

Picking up openshift/library-go@1b9753d298 (Bug 2071998: pkg/verify:
Expose underlying signature errors, 2022-04-21,
openshift/library-go#1358) and openshift/library-go#1371.  Generated
with:

  $ go get -u github.com/openshift/library-go
  $ go mod tidy
  $ go mod vendor
  $ git add -A go.* vendor

using:

  $ go version
  go version go1.17.3 linux/amd64
@wking wking force-pushed the event-on-forced-signature-failures branch from f05c290 to 194e690 Compare May 23, 2022 19:56
@wking
Copy link
Member Author

wking commented May 24, 2022

test upgrade 4.11.0-0.ci-2022-05-20-165259,openshift/cluster-version-operator#763 4.11.0-0.ci-2022-05-20-165259:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1528827908339011584/artifacts/launch/gather-extra/artifacts/events.json | jq -r '[.items[] | select(.metadata.namespace == "openshift-cluster-version")] | sort_by(.metadata.creationTimestamp)[] | .metadata.creationTimestamp + " " + .reason + ": " + .message' | grep -1 keyring
2022-05-23T20:41:39Z Completed: Job completed
2022-05-23T20:41:41Z RetrievePayload: Target release version="" image="registry.build01.ci.openshift.org/ci-ln-d3qhbgb/release@sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76" cannot be verified, but continuing anyway because the update was forced: unable to verify sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76 against keyrings: verifier-public-key-redhat
2022-05-23T20:41:41Z LoadPayload: Loading payload version="" image="registry.build01.ci.openshift.org/ci-ln-d3qhbgb/release@sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76"

Grr. Maybe my Unwrap dance is broken?

@wking wking force-pushed the event-on-forced-signature-failures branch from 194e690 to 71762af Compare May 24, 2022 03:13
@wking
Copy link
Member Author

wking commented May 24, 2022

unit:

FAIL: TestCVO_UpgradeFailedPayloadLoadWithCapsChanges
...
    cvo_scenarios_test.go:1763: unexpected object:   &v1.ClusterVersion{
          	TypeMeta:   {},
          	ObjectMeta: {Name: "version", ResourceVersion: "3", Generation: 1},
          	Spec:       {ClusterID: "ae71b16e-8325-471a-aa2d-08fdfc5c9925", DesiredUpdate: &{Version: "1.0.0-abc", Image: "image/image:0"}, Channel: "fast", Capabilities: &{BaselineCapabilitySet: "None", AdditionalEnabledCapabilities: {"baremetal", "marketplace"}}, ...},
          	Status: v1.ClusterVersionStatus{
          		... // 3 identical fields
          		VersionHash:  "DL-FFQ2Uem8=",
          		Capabilities: {EnabledCapabilities: {"baremetal", "marketplace"}, KnownCapabilities: {"baremetal", "marketplace", "openshift-samples"}},
          		Conditions: []v1.ClusterOperatorStatusCondition{
          			... // 2 identical elements
          			{Type: "Available", Status: "True", Message: "Done applying 1.0.1-abc"},
          			{Type: "Failing", Status: "False"},
          			{
          				Type:               "Progressing",
        - 				Status:             "True",
        + 				Status:             "False",
          				LastTransitionTime: {},
          				Reason:             "",
        - 				Message:            "Working towards 1.0.1-abc: 3 of 3 done (100% complete)",
        + 				Message:            "Cluster version is 1.0.1-abc",
          			},
          			{Type: "RetrievedUpdates", Status: "False"},
          		},
          		AvailableUpdates:   nil,
          		ConditionalUpdates: nil,
          	},
          }

feels unrelated, but see how reproducible it is:

/test unit

…failure

For unforced updates, when signature, etc. verification fails,
RetrievePayload returns an error, and we have emitted events for
RetrievePayload errors since 475e71f (emit events for each new
payload, 2020-07-21, openshift#411).  However, when the user forces the update,
we log but do not error out on verification failures.  With this
commit, we will also emit a warning event with an error message, which
will make it easier to understand how signature verification failed,
even when we don't have access to the logs of the outgoing
cluster-version operator [1].

We should transition away from the maintenance-mode
github.com/pkg/errors [2], but I'm deferring that to follow-up work.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2071998#c2
[2]: https://pkg.go.dev/github.com/pkg/errors#readme-roadmap
@wking wking force-pushed the event-on-forced-signature-failures branch from 71762af to 5ad3c14 Compare May 24, 2022 05:32
@wking
Copy link
Member Author

wking commented May 24, 2022

test upgrade 4.11.0-0.ci-2022-05-20-165259,openshift/cluster-version-operator#763 4.11.0-0.ci-2022-05-20-165259:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1528972481522569216/artifacts/launch/gather-extra/artifacts/events.json | jq -r '[.items[] | select(.metadata.namespace == "openshift-cluster-version")] | sort_by(.metadata.creationTimestamp)[] | .metadata.creationTimestamp + " " + .reason + ": " + .message' | grep -A2 keyring
Target release version="" image="registry.build01.ci.openshift.org/ci-ln-b7pvh2t/release@sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76" cannot be verified, but continuing anyway because the update was forced: unable to verify sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76 against keyrings: verifier-public-key-redhat
[2022-05-24T06:24:57Z: prefix sha256-83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76 in config map signatures-managed: no more signatures to check, 2022-05-24T06:24:57Z: unable to retrieve signature from https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76/signature-1: no more signatures to check, 2022-05-24T06:24:57Z: unable to retrieve signature from https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76/signature-1: no more signatures to check, 2022-05-24T06:24:57Z: parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check, 2022-05-24T06:24:57Z: serial signature store wrapping config maps in openshift-config-managed with label "release.openshift.io/verification-signatures", parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check]
2022-05-24T06:25:12Z LoadPayload: Loading payload version="" image="registry.build01.ci.openshift.org/ci-ln-b7pvh2t/release@sha256:83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76"

So 5ad3c14's UpdateError.Unwrap addition was one of the missing pieces, and we're good now. Reformatting that error message with more newlines:

  • 2022-05-24T06:24:57Z: prefix sha256-83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76 in config map signatures-managed: no more signatures to check is "no local ConfigMap signatures" (e.g. if the admin had mirrored a signature in).
  • Getting into the parallel store wrapping our two sigstore (HTTP) stores:
    • 2022-05-24T06:24:57Z: unable to retrieve signature from https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76/signature-1: no more signatures to check, this is a 404 from Google.
    • 2022-05-24T06:24:57Z: unable to retrieve signature from https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=83715487f0af0b6c4efa3e2ab1c506d43a21dd4241108c07fbced05e8b8cfa76/signature-1: no more signatures to check and a 404 from our mirror.
    • 2022-05-24T06:24:57Z: parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check and the parallel store that wraps the two sigstore (HTTP) stores runs out of options.
  • 2022-05-24T06:24:57Z: serial signature store wrapping config maps in openshift-config-managed with label "release.openshift.io/verification-signatures", parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check and the serial store that wraps the ConfigMap and parallel stores runs out of options.

Looks good to me.

@jottofar
Copy link
Contributor

/lgtm
/retest

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 24, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 24, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking
Copy link
Member Author

wking commented May 24, 2022

/hold cancel

Error message is more specific now.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2022
@wking
Copy link
Member Author

wking commented May 24, 2022

Watchdog gaps are unrelated.

/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 24, 2022

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade

Details

In response to this:

Watchdog gaps are unrelated.

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit eea6a7c into openshift:master May 24, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 24, 2022

@wking: All pull requests linked via external trackers have merged:

Bugzilla bug 2071998 has been moved to the MODIFIED state.

Details

In response to this:

Bug 2071998: pkg/cvo/updatepayload: Event when forcing through a sig-verification failure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 24, 2022

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking wking deleted the event-on-forced-signature-failures branch May 26, 2022 00:26
wking added a commit to wking/cluster-version-operator that referenced this pull request Dec 7, 2023
…s too

We've including verification-failure details for forced updates since
5ad3c14 (pkg/cvo/updatepayload: Event when forcing through a
sig-verification failure, 2022-04-07, 2022, openshift#763), but had not been
including them in logs or other output in the "we aren't forcing, so
this blocks the update's acceptance" case.  This commit adds the
detail to the Event, so it's available, but keeps only the high-level
message in the RetrievePayload status output (which feeds the
ReleaseAccepted condition in ClusterVersion), because while the
low-level are useful for debugging, they're pretty chatty for
condition consumers that are more interested in just knowing basically
why the update request isn't being accepted.
wking added a commit to wking/cluster-version-operator that referenced this pull request Dec 8, 2023
…s too

We've including verification-failure details for forced updates since
5ad3c14 (pkg/cvo/updatepayload: Event when forcing through a
sig-verification failure, 2022-04-07, 2022, openshift#763), but had not been
including them in logs or other output in the "we aren't forcing, so
this blocks the update's acceptance" case.  This commit adds the
detail to the Event, so it's available, but keeps only the high-level
message in the RetrievePayload status output (which feeds the
ReleaseAccepted condition in ClusterVersion), because while the
low-level are useful for debugging, they're pretty chatty for
condition consumers that are more interested in just knowing basically
why the update request isn't being accepted.
wking added a commit to wking/cluster-version-operator that referenced this pull request Dec 9, 2023
…s too

We've including verification-failure details for forced updates since
5ad3c14 (pkg/cvo/updatepayload: Event when forcing through a
sig-verification failure, 2022-04-07, 2022, openshift#763), but had not been
including them in logs or other output in the "we aren't forcing, so
this blocks the update's acceptance" case.  This commit adds the
detail to the Event, so it's available, but keeps only the high-level
message in the RetrievePayload status output (which feeds the
ReleaseAccepted condition in ClusterVersion), because while the
low-level are useful for debugging, they're pretty chatty for
condition consumers that are more interested in just knowing basically
why the update request isn't being accepted.

The newline-to-// replacement is because apparently Event messages
truncate at the first newline.  I have not tracked down docs or source
to back that up, but confirmed it in pre-merge testing [1].

[1]: openshift#1003 (comment)
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Dec 19, 2023
…s too

We've including verification-failure details for forced updates since
5ad3c14 (pkg/cvo/updatepayload: Event when forcing through a
sig-verification failure, 2022-04-07, 2022, openshift#763), but had not been
including them in logs or other output in the "we aren't forcing, so
this blocks the update's acceptance" case.  This commit adds the
detail to the Event, so it's available, but keeps only the high-level
message in the RetrievePayload status output (which feeds the
ReleaseAccepted condition in ClusterVersion), because while the
low-level are useful for debugging, they're pretty chatty for
condition consumers that are more interested in just knowing basically
why the update request isn't being accepted.

The newline-to-// replacement is because apparently Event messages
truncate at the first newline.  I have not tracked down docs or source
to back that up, but confirmed it in pre-merge testing [1].

[1]: openshift#1003 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants