Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Dec 7, 2023

We've including verification-failure details for forced updates since 5ad3c14 (#763), but had not been including them in logs or other output in the "we aren't forcing, so this blocks the update's acceptance" case. This commit adds the detail to the Event, so it's available, but keeps only the high-level message in the RetrievePayload status output (which feeds the ReleaseAccepted condition in ClusterVersion), because while the low-level are useful for debugging, they're pretty chatty for condition consumers that are more interested in just knowing basically why the update request isn't being accepted.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2023
@wking wking force-pushed the understand-signature-errors branch from 8196661 to 9c2e6ca Compare December 8, 2023 00:03
@wking
Copy link
Member Author

wking commented Dec 8, 2023

Testing 9c2e6ca with launch 4.15,openshift/cluster-version-operator#1003 aws Cluster Bot cluster (logs):

$ oc adm upgrade --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000
warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade for the update to proceed anyway
Requested update to release image quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000
$ oc -n openshift-cluster-version get -o json events | jq -r '.items[] | select(.reason | contains("Payload")) | .reason + ": " + .message' | grep verified | sort | uniq -c
      5 RetrievePayloadFailed: Retrieving payload failed version="" image="quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000" failure=The update cannot be verified: unable to verify sha256:0000000000000000000000000000000000000000000000000000000000000000 against keyrings: verifier-public-key-redhat
$ oc adm upgrade
Cluster version is 4.15.0-0.test-2023-12-08-022856-ci-ln-8rvgdc2-latest

ReleaseAccepted=False

  Reason: RetrievePayload
  Message: Retrieving payload failed version="" image="quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000" failure=The update cannot be verified: unable to verify sha256:0000000000000000000000000000000000000000000000000000000000000000 against keyrings: verifier-public-key-redhat

warning: Cannot display available updates:
  Reason: NoChannel
  Message: The update channel has not been configured.

hrm, that doesn't seem to be working...

@wking wking force-pushed the understand-signature-errors branch 2 times, most recently from 28212c2 to 588b59c Compare December 8, 2023 06:04
@wking
Copy link
Member Author

wking commented Dec 8, 2023

Testing 588b59c with launch 4.15,#1003 aws Cluster Bot cluster (logs):

$ oc adm upgrade --allow-explicit-upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000
warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade for the update to proceed anyway
Requested update to release image quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000
$ oc -n openshift-cluster-version get -o json events | jq -r '.items[] | select(.reason | contains("Payload")) | .reason + ": " + .message' | grep verified | sort | uniq -c
      4 BRetrievePayloadFailed: A Retrieving payload failed version="" image="quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000" failure=The update cannot be verified: unable to verify sha256:0000000000000000000000000000000000000000000000000000000000000000 against keyrings: verifier-public-key-redhat
$ oc adm upgrade
Cluster version is 4.15.0-0.test-2023-12-08-165243-ci-ln-qbk5hm2-latest

ReleaseAccepted=False

  Reason: RetrievePayload
  Message: C Retrieving payload failed version="" image="quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000" failure=The update cannot be verified: unable to verify sha256:0000000000000000000000000000000000000000000000000000000000000000 against keyrings: verifier-public-key-redhat

warning: Cannot display available updates:
  Reason: NoChannel
  Message: The update channel has not been configured.

$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 | grep 'debug unwrapping' | tail -3
I1208 18:53:08.723595       1 sync_worker.go:1301] debug unwrapping 0 The update cannot be verified: unable to verify sha256:0000000000000000000000000000000000000000000000000000000000000000 against keyrings: verifier-public-key-redhat
I1208 18:53:08.723599       1 sync_worker.go:1301] debug unwrapping 1 unable to verify sha256:0000000000000000000000000000000000000000000000000000000000000000 against keyrings: verifier-public-key-redhat
I1208 18:53:08.723602       1 sync_worker.go:1301] debug unwrapping 2 [2023-12-08T18:53:08Z: prefix sha256-0000000000000000000000000000000000000000000000000000000000000000 in config map signatures-managed: no more signatures to check, 2023-12-08T18:53:08Z: unable to retrieve signature from https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=0000000000000000000000000000000000000000000000000000000000000000/signature-1: no more signatures to check, 2023-12-08T18:53:08Z: unable to retrieve signature from https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=0000000000000000000000000000000000000000000000000000000000000000/signature-1: no more signatures to check, 2023-12-08T18:53:08Z: parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check, 2023-12-08T18:53:08Z: serial signature store wrapping ClusterVersion signatureStores unset, falling back to default stores, parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check, 2023-12-08T18:53:08Z: serial signature store wrapping config maps in openshift-config-managed with label "release.openshift.io/verification-signatures", serial signature store wrapping ClusterVersion signatureStores unset, falling back to default stores, parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check]

So still not working, but at least we have the detailed error message there in the logs; I just need to figure out why I'm failing to get it up into the Event.

@wking wking force-pushed the understand-signature-errors branch from 588b59c to a032989 Compare December 8, 2023 22:51
…s too

We've including verification-failure details for forced updates since
5ad3c14 (pkg/cvo/updatepayload: Event when forcing through a
sig-verification failure, 2022-04-07, 2022, openshift#763), but had not been
including them in logs or other output in the "we aren't forcing, so
this blocks the update's acceptance" case.  This commit adds the
detail to the Event, so it's available, but keeps only the high-level
message in the RetrievePayload status output (which feeds the
ReleaseAccepted condition in ClusterVersion), because while the
low-level are useful for debugging, they're pretty chatty for
condition consumers that are more interested in just knowing basically
why the update request isn't being accepted.

The newline-to-// replacement is because apparently Event messages
truncate at the first newline.  I have not tracked down docs or source
to back that up, but confirmed it in pre-merge testing [1].

[1]: openshift#1003 (comment)
@wking wking force-pushed the understand-signature-errors branch from c70e2ec to c75d15b Compare December 9, 2023 00:53
@wking
Copy link
Member Author

wking commented Dec 9, 2023

The issue seems to have been Event messages truncating at the first newline. Should be fixed with c75d15b.

@wking
Copy link
Member Author

wking commented Dec 9, 2023

After removing the debugging content, testing c75d15b with launch 4.15,openshift/cluster-version-operator#1003 aws (logs):

$ oc -n openshift-cluster-version get -o json events | jq -r '.items[] | select(.reason | contains("Payload")) | .reason + ": " + .message' | grep verified | sort | uniq -c
      1 RetrievePayloadFailed: Retrieving payload failed version="" image="quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000" failure=The update cannot be verified: unable to verify sha256:0000000000000000000000000000000000000000000000000000000000000000 against keyrings: verifier-public-key-redhat // [2023-12-09T01:33:34Z: prefix sha256-0000000000000000000000000000000000000000000000000000000000000000 in config map signatures-managed: no more signatures to check, 2023-12-09T01:33:34Z: unable to retrieve signature from https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=0000000000000000000000000000000000000000000000000000000000000000/signature-1: no more signatures to check, 2023-12-09T01:33:34Z: unable to retrieve signature from https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=0000000000000000000000000000000000000000000000000000000000000000/signature-1: no more signatures to check, 2023-12-09T01:33:34Z: parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check, 2023-12-09T01:33:34Z: serial signature store wrapping ClusterVersion signatureStores unset, falling back to default stores, parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check, 2023-12-09T01:33:34Z: serial signature store wrapping config maps in openshift-config-managed with label "release.openshift.io/verification-signatures", serial signature store wrapping ClusterVersion signatureStores unset, falling back to default stores, parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check]
$ oc adm upgrade
Cluster version is 4.15.0-0.test-2023-12-09-005930-ci-ln-0ryhgq2-latest

ReleaseAccepted=False

  Reason: RetrievePayload
  Message: Retrieving payload failed version="" image="quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000" failure=The update cannot be verified: unable to verify sha256:0000000000000000000000000000000000000000000000000000000000000000 against keyrings: verifier-public-key-redhat

warning: Cannot display available updates:
  Reason: NoChannel
  Message: The update channel has not been configured.

That detailed verify failure message is still fairly intimidating, but it has a lot of information. Formatting:

$ oc -n openshift-cluster-version get -o json events | jq -r '.items[] | select(.reason | contains("Payload")) | .metadata.creationTimestamp + " " + .reason + ": " + .message' | grep verified | tail -n1 | sed 's| // |\n|g;s/, 2023/,\n2023/g'
2023-12-09T01:35:29Z RetrievePayloadFailed: Retrieving payload failed version="" image="quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000" failure=The update cannot be verified: unable to verify sha256:0000000000000000000000000000000000000000000000000000000000000000 against keyrings: verifier-public-key-redhat
[2023-12-09T01:35:29Z: prefix sha256-0000000000000000000000000000000000000000000000000000000000000000 in config map signatures-managed: no more signatures to check,
2023-12-09T01:35:29Z: unable to retrieve signature from https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=0000000000000000000000000000000000000000000000000000000000000000/signature-1: no more signatures to check,
2023-12-09T01:35:29Z: unable to retrieve signature from https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=0000000000000000000000000000000000000000000000000000000000000000/signature-1: no more signatures to check,
2023-12-09T01:35:29Z: parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check,
2023-12-09T01:35:29Z: serial signature store wrapping ClusterVersion signatureStores unset, falling back to default stores, parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check,
2023-12-09T01:35:29Z: serial signature store wrapping config maps in openshift-config-managed with label "release.openshift.io/verification-signatures", serial signature store wrapping ClusterVersion signatureStores unset, falling back to default stores, parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check]

showing us walking the ConfigMap store, failing to find any signatures, and then walking both default sig-store stores, failing to find any signatures, and then failing out the wrapping stores

@wking wking changed the title pkg/cvo/sync_worker: Verification-failure details for unforced updates too OCPBUGS-25055: pkg/cvo/sync_worker: Verification-failure details for unforced updates too Dec 12, 2023
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 12, 2023
@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Jira Issue OCPBUGS-25055, which is invalid:

  • expected the bug to target either version "4.16." or "openshift-4.16.", but it targets "4.15.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

We've including verification-failure details for forced updates since 5ad3c14 (#763), but had not been including them in logs or other output in the "we aren't forcing, so this blocks the update's acceptance" case. This commit adds the detail to the Event, so it's available, but keeps only the high-level message in the RetrievePayload status output (which feeds the ReleaseAccepted condition in ClusterVersion), because while the low-level are useful for debugging, they're pretty chatty for condition consumers that are more interested in just knowing basically why the update request isn't being accepted.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member Author

wking commented Dec 12, 2023

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Dec 12, 2023
@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Jira Issue OCPBUGS-25055, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Dec 12, 2023
@openshift-ci openshift-ci bot requested a review from jiajliu December 12, 2023 01:30
@jiajliu
Copy link
Contributor

jiajliu commented Dec 14, 2023

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Dec 14, 2023
@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Jira Issue OCPBUGS-25055, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

We've including verification-failure details for forced updates since 5ad3c14 (#763), but had not been including them in logs or other output in the "we aren't forcing, so this blocks the update's acceptance" case. This commit adds the detail to the Event, so it's available, but keeps only the high-level message in the RetrievePayload status output (which feeds the ReleaseAccepted condition in ClusterVersion), because while the low-level are useful for debugging, they're pretty chatty for condition consumers that are more interested in just knowing basically why the update request isn't being accepted.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@petr-muller
Copy link
Member

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 14, 2023

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit f10051b into openshift:master Dec 14, 2023
@openshift-ci-robot
Copy link
Contributor

@wking: Jira Issue OCPBUGS-25055: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-25055 has been moved to the MODIFIED state.

Details

In response to this:

We've including verification-failure details for forced updates since 5ad3c14 (#763), but had not been including them in logs or other output in the "we aren't forcing, so this blocks the update's acceptance" case. This commit adds the detail to the Event, so it's available, but keeps only the high-level message in the RetrievePayload status output (which feeds the ReleaseAccepted condition in ClusterVersion), because while the low-level are useful for debugging, they're pretty chatty for condition consumers that are more interested in just knowing basically why the update request isn't being accepted.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build cluster-version-operator-container-v4.16.0-202312142332.p0.gf10051b.assembly.stream for distgit cluster-version-operator.
All builds following this will include this PR.

@wking wking deleted the understand-signature-errors branch December 19, 2023 02:15
@wking
Copy link
Member Author

wking commented Dec 19, 2023

/cherrypick release-4.15

@openshift-cherrypick-robot

@wking: new pull request created: #1007

Details

In response to this:

/cherrypick release-4.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Dec 19, 2023
…s too

We've including verification-failure details for forced updates since
5ad3c14 (pkg/cvo/updatepayload: Event when forcing through a
sig-verification failure, 2022-04-07, 2022, openshift#763), but had not been
including them in logs or other output in the "we aren't forcing, so
this blocks the update's acceptance" case.  This commit adds the
detail to the Event, so it's available, but keeps only the high-level
message in the RetrievePayload status output (which feeds the
ReleaseAccepted condition in ClusterVersion), because while the
low-level are useful for debugging, they're pretty chatty for
condition consumers that are more interested in just knowing basically
why the update request isn't being accepted.

The newline-to-// replacement is because apparently Event messages
truncate at the first newline.  I have not tracked down docs or source
to back that up, but confirmed it in pre-merge testing [1].

[1]: openshift#1003 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants