Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Jan 20, 2025

Catching up with openshift/cluster-version-operator@9be6175c5f (openshift/cluster-version-operator#431), which uses the version property as a sanity check for "is this pullspec the version I'm expecting?". This protects users from compromised or man-in-the-middled upstream update services who attempt downgrade and similar attacks by misrepresenting a recommended update.

The text I'm adjusting landed in 354e2fb (#1339), but version-ignoring was never implemented, so nobody can be relying on that nominal behavior. And as the man-in-the-middle use case demonstrates, version-ignoring would be less safe than the version-match-enforcing behavior that the cluster-version operator has used since 2020.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 20, 2025

Hello @wking! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 20, 2025
@openshift-ci openshift-ci bot requested review from JoelSpeed and deads2k January 20, 2025 21:12
@wking wking changed the title config/v1/types_cluster_version: Explain image and version both set OCPBUGS-48641: config/v1/types_cluster_version: Explain image and version both set Jan 20, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jan 20, 2025
@openshift-ci-robot
Copy link

@wking: This pull request references Jira Issue OCPBUGS-48641, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @shellyyang1989

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Catching up with openshift/cluster-version-operator@9be6175c5f (openshift/cluster-version-operator#431), which uses the version property as a sanity check for "is this pullspec the version I'm expecting?". This protects users from compromised or man-in-the-middled upstream update services who attempt downgrade and similar attacks by misrepresenting a recommended update.

The text I'm adjusting landed in 354e2fb (#1339), but version-ignoring was never implemented, so nobody can be relying on that nominal behavior. And as the man-in-the-middle use case demonstrates, version-ignoring would be less safe than the version-match-enforcing behavior that the cluster-version operator has used since 2020.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from shellyyang1989 January 20, 2025 21:42
// Some of the fields are inter-related with restrictions and meanings described here.
// 1. image is specified, version is specified, architecture is specified. API validation error.
// 2. image is specified, version is specified, architecture is not specified. You should not do this. version is silently ignored and image is used.
// 2. image is specified, version is specified, architecture is not specified. The version metadata in the referenced image must match the specified version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the changes below about Version, is it more complete this way?

Suggested change
// 2. image is specified, version is specified, architecture is not specified. The version metadata in the referenced image must match the specified version.
// 2. image is specified, version is specified, architecture is not specified. image is used if the version metadata in the referenced image matches the specified version. API validation error otherwise.

Copy link
Member

@hongkailiu hongkailiu Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifted from Slack:

"""
I think "API validation error" in the Godocs is trying to describe the CEL kubebuilder:validation:XValidation:rule here, because those are enforced by the Kube API server's validation, and clients cannot push invalid combinations. For version, there's no CEL enforcement, it's just up to the CVO to decide how to handle the "image and version both set" situation, and report its thoughts in status.conditions
"""

:TIL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What validations are made? Is it possible that they could be moved to CEL? Or does it need to introspect the image to validate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to introspect the image, checking the version in the release image's release-metadata file against the spec version string that the cluster admin said they expected. See OCPBUGS-48641's:

Verifying payload failed version="4.17.99" image="quay.io/openshift-release-dev/ocp-release@sha256:82aa2a914d4cd964deda28b99049abbd1415f96c0929667b0499dd968864a8dd" failure=release image version 4.17.13 does not match the expected upstream version 4.17.99

error message for an example, where the CVO looks inside the sha256:82aa2a9... release image, sees that the release-metadata file claims that release image is 4.17.13, and then complains that the release image's 4.17.13 diverges from the spec version's 4.17.99 expectation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, and is this validation via a webhook or part of the controller? The API validation error otherwise message seems odd here, especially if this is a controller based validation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's via the controller (CVO). And yes, that's why I would rather use my current text, and not the API validation error text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would certainly be dropping the API validation error

Architecture ClusterVersionArchitecture `json:"architecture"`

// version is a semantic version identifying the update version.
// version is ignored if image is specified and required if
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poking a bit more deeply into where the nominal "ignored" came from, on 2022-11-08 I claimed it was ignored. I'm not sure what 2022-me was thinking there; possibly I was just focused on how the CVO looks up which image to use (and that logic doesn't run when image is explicitly set in spec), and I overlooked the sync-worker validation as it judges the requested desiredUpdate for ReleaseAccepted?

@hongkailiu
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 21, 2025
@wking wking force-pushed the godocs-for-ClusterVersion-image-with-version branch from 965895d to 387fac3 Compare January 21, 2025 01:48
@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 21, 2025
@wking wking force-pushed the godocs-for-ClusterVersion-image-with-version branch from 387fac3 to 435a43a Compare January 21, 2025 05:31
@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller January 21, 2025 13:05
// Some of the fields are inter-related with restrictions and meanings described here.
// 1. image is specified, version is specified, architecture is specified. API validation error.
// 2. image is specified, version is specified, architecture is not specified. You should not do this. version is silently ignored and image is used.
// 2. image is specified, version is specified, architecture is not specified. The version metadata in the referenced image must match the specified version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What validations are made? Is it possible that they could be moved to CEL? Or does it need to introspect the image to validate?

// version is a semantic version identifying the update version.
// version is ignored if image is specified and required if
// architecture is specified.
// version is required if architecture is specified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a CEL rule to validate this. We can test that it ratchets so that existing broken resources do not suddenly become broken

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"version is ... required if architecture is specified" dates back to #1339, and seems orthogonal to the change I'm suggesting here. And actually, oc adm upgrade --to-multi-arch is setting both architecture and version, and I don't see a reason to block that; it's the same sanity-check of "yes, the image the cluster retrieved seems like the release the cluster admin was expecting" for folks where version numbers are more recognizable than image digests (everybody? Definitely me, anyway). Should I drop that unnecessary constraint from the docs in this pull request, or can I file a follow-up pull request dropping that constraint once this one merges? Or...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's orthogonal, but it's good to make compatible incremental change as we are touching areas of APIs.

What happens if version is missing when architecture is specified today? Ignoring CLI tooling that would set it, since folks can manipulate these resources themselves, would it cause CVO to return errors when it processes the object?

If so, adding a CEL rule as below would give more immediate feedback to a user, and is relatively free to us to implement. As of 4.18 this should ratchet itself, but we would need to test it.

// +kubebuilder:validation:XValidation:rule="!has(self.architecture) || has(self.version)",message="version if required when architecture is set"

A self ratcheting version

// +kubebuilder:validation:XValidation:rule="!has(self.architecture) || has(self.version) || has(oldSelf.architecture)",message="version if required when architecture is set"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if version is missing when architecture is specified today?

Looks like that's already guarded here. With a launch 4.17.12 aws Cluster Bot cluster:

$ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/desiredUpdate", "value": {"architecture": "Multi"}}]'
The ClusterVersion "version" is invalid: spec.desiredUpdate: Invalid value: "object": no such key: version evaluating rule: Version must be set if Architecture is set

So I can leave the version is required if architecture is specified docs in place here, and don't need to add additional CEL.

// image should be used when the desired version does not exist in availableUpdates or history.
// When image is set, version is ignored. When image is set, version should be empty.
// When image is set, architecture cannot be specified.
// If both version and image are set, the version metadata in the referenced image must match the specified version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does version metadata actually mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ oc adm release info -o json quay.io/openshift-release-dev/ocp-release:4.17.12-x86_64 | jq -r .metadata.version
4.17.12

Docs discussing the release-metadata file that holds that as part of the release image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an admin were unsure, would they be able to check this themselves from the docs on the API? Perhaps linking out to this doc would be a useful help for users of this API?

Copy link
Member Author

@wking wking Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much precedent to linking out to the enhancements repo for more details:

api$ git --no-pager grep github.com/openshift/enhancements/blob/
README.md:conventions](https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#api),
machine/v1beta1/types_machine.go:       // https://github.com/openshift/enhancements/blob/master/enhancements/machine-instance-lifecycle.md

How about inlining something more here to make it clear that it's metadata extracted from the release image? Maybe "the version metadata extracted from the referenced image" would be sufficient? Or "the version extracted from the referenced image"? Or "the version string..."? Or...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the version extracted from the referenced image

Lets get it updated to this.

Do we assume then that the admin knows the semver version of the image just based on how they request an update?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the version extracted from the referenced image

Lets get it updated to this.

Done, and also rebased onto the current tip, with 435a43a -> f2e7034.

Do we assume then that the admin knows the semver version of the image just based on how they request an update?

If they do, I'd expect them to set version to pull in the "is the incoming release the SemVer version you expected?" guard. For example, oc adm upgrade --to "${VERSION}" is setting both version and image here and in other, similar locations, in case the OpenShift Update Service used to populate ClusterVersion's status.availableUpdates or status.conditionalUpdates is falsely claiming the wrong version for a particular image pullspec.

The image property expects a by-digest pullspec (e.g. see openshift/oc#390's client-side tag-pullspec guard), so it's less clear in the "image set, version not set" case if the admin knows the SemVer release they're trying to move towards. But even if they do, unless they tell us via version, all we have to go on is their image, so there's no way I can see to enforce a "is the incoming release the SemVer version you expected?" guard, because we don't have the expected-version version property to check against.

Catching up with openshift/cluster-version-operator@9be6175c5f
(pkg/cvo/sync_worker: Make expected/actual version mismatch fatal,
2020-08-09, openshift/cluster-version-operator#431), which uses the
'version' property as a sanity check for "is this pullspec the version
I'm expecting?".  This protects users from compromised or
man-in-the-middled upstream update services who attempt downgrade and
similar attacks by misrepresenting a recommended update.

The text I'm adjusting landed in 354e2fb
(config/v1/types_cluster_version: Add Architecture to DesiredUpdate,
2022-12-07, openshift#1339), but version-ignoring was never implemented, so
nobody can be relying on that nominal behavior.  And as the
man-in-the-middle use case demonstrates, version-ignoring would be
less safe than the version-match-enforcing behavior that the
cluster-version operator has used since 2020.

I edited types_cluster_version.go by hand, and then updated the other
files with:

  $ hack/update-codegen-crds.sh
  $ hack/update-openapi.sh
  $ hack/update-swagger-docs.sh
@wking wking force-pushed the godocs-for-ClusterVersion-image-with-version branch from 435a43a to f2e7034 Compare February 17, 2025 07:49
@JoelSpeed
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongkailiu, JoelSpeed, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 5, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 5, 2025

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 68f2fd6 into openshift:master Mar 5, 2025
12 checks passed
@openshift-ci-robot
Copy link

@wking: Jira Issue OCPBUGS-48641: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-48641 has been moved to the MODIFIED state.

Details

In response to this:

Catching up with openshift/cluster-version-operator@9be6175c5f (openshift/cluster-version-operator#431), which uses the version property as a sanity check for "is this pullspec the version I'm expecting?". This protects users from compromised or man-in-the-middled upstream update services who attempt downgrade and similar attacks by misrepresenting a recommended update.

The text I'm adjusting landed in 354e2fb (#1339), but version-ignoring was never implemented, so nobody can be relying on that nominal behavior. And as the man-in-the-middle use case demonstrates, version-ignoring would be less safe than the version-match-enforcing behavior that the cluster-version operator has used since 2020.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@wking wking deleted the godocs-for-ClusterVersion-image-with-version branch March 8, 2025 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants