-
Notifications
You must be signed in to change notification settings - Fork 77
(psa) restrict olm namespace + remove labels from openshift-operators ns #367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(psa) restrict olm namespace + remove labels from openshift-operators ns #367
Conversation
| labels: | ||
| pod-security.kubernetes.io/enforce: restricted | ||
| pod-security.kubernetes.io/enforce-version: "v1.24" | ||
| openshift.io/scc: "anyuid" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
We just label it to enforce baseline because the pod requires more permissions than restricted (default enforcement in ocp 4.12) Can we ensure that all pods can run as restricted now? If so, why do we need to enforce as restricted? Is it to test and ensure that we will not break anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all pods running in the openshift-operator-lifecycle-manager are running under the restricted-v2 SCC, thus the PSA enforcement=restricted should be fine.
$ oc project
Using project "openshift-operator-lifecycle-manager" on server "https://api.bparees.devcluster.openshift.com:6443".
$ oc get pods -o yaml | grep scc
openshift.io/scc: restricted-v2
openshift.io/scc: restricted-v2
openshift.io/scc: restricted-v2
openshift.io/scc: restricted-v2
openshift.io/scc: restricted-v2
openshift.io/scc: restricted-v2
openshift.io/scc: restricted-v2
openshift.io/scc: restricted-v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, so we merge the changes 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold cancel
| name: openshift-operators | ||
| labels: | ||
| pod-security.kubernetes.io/enforce: baseline | ||
| pod-security.kubernetes.io/enforce-version: "v1.24" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until we have the OLM logic in place to label "openshift-* namespaces that have operators installed" for labelsyncing, we should probably leave this explicit setting in place (or even set it to privileged), so that operators that are installed in this NS don't get rejected by PSA.
cc @perdasilva
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see 2 options:
a) Label as privileged ( we do not know what will be required for ANY operator installed on this one ).
b) OR ONLY add the label sync security.openshift.io/scc.podSecurityLabelSync=true (which is what we will add when we have the controller)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. I've modified the PR to go with option 1 and explicitly label it as privileged, so that when https://issues.redhat.com/browse/OLM-2695 is ready we have to make a conscious effort to remove any hard-coding of labels in this namespace.
|
/retest |
This PR: 1. Adds the enforce:restricted Pod Security Admission labels to the openshift-operator-lifecycle-manager namespace 2. Adds the enforce:privileged PSA labels to the openshift-operator namespace, that will be removed in a future commit, when another entity is present to modify the namespace to set the security of the namespace according to the workloads present in the namespace.
7481b0a to
be84d80
Compare
|
@anik120: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
camilamacedo86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: anik120, camilamacedo86 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec The breaking change was an OLM namespace touch: $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [4], and will not be backported to 4.12. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: wking@penguin /tmp/operator-framework-olm $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD validation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: openshift/operator-framework-olm#367
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec Jian Zhang also noticed the breakage in QE [4]. The breaking change was an OLM namespace touch: $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [5], and will not be backported to 4.12. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: wking@penguin /tmp/operator-framework-olm $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD validation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: https://issues.redhat.com/browse/OCPBUGS-575 [5]: openshift/operator-framework-olm#367
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec Jian Zhang also noticed the breakage in QE [4]. The breaking change was an OLM namespace touch: $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/release/artifacts/release-images-latest $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/release/artifacts/release-images-latest $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]' $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [5], and will not be backported to 4.11. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: wking@penguin /tmp/operator-framework-olm $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD validation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: https://issues.redhat.com/browse/OCPBUGS-575 [5]: openshift/operator-framework-olm#367
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec Jian Zhang also noticed the breakage in QE [4]. The breaking change was an OLM namespace touch: $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/release/artifacts/release-images-latest $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/release/artifacts/release-images-latest $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]' $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [5], and will not be backported to 4.11. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: wking@penguin /tmp/operator-framework-olm $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' ...no hits... So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD validation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: https://issues.redhat.com/browse/OCPBUGS-575 [5]: openshift/operator-framework-olm#367
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec Jian Zhang also noticed the breakage in QE [4]. The breaking change was an OLM namespace touch: $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/release/artifacts/release-images-latest $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/release/artifacts/release-images-latest $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]' $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [5], and will not be backported to 4.11. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' ...no hits... So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD validation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: https://issues.redhat.com/browse/OCPBUGS-575 [5]: openshift/operator-framework-olm#367
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec Jian Zhang also noticed the breakage in QE [4]. The breaking change was an OLM namespace touch: $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/release/artifacts/release-images-latest $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/release/artifacts/release-images-latest $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]' $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [5], and will not be backported to 4.11. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' ...no hits... So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD reconciliation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: https://issues.redhat.com/browse/OCPBUGS-575 [5]: openshift/operator-framework-olm#367
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec Jian Zhang also noticed the breakage in QE [4]. The breaking change was an OLM namespace touch: $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/release/artifacts/release-images-latest $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/release/artifacts/release-images-latest $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]' $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [5], and will not be backported to 4.11. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' ...no hits... So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD reconciliation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: https://issues.redhat.com/browse/OCPBUGS-575 [5]: openshift/operator-framework-olm#367
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec Jian Zhang also noticed the breakage in QE [4]. The breaking change was an OLM namespace touch: $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/release/artifacts/release-images-latest $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/release/artifacts/release-images-latest $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]' $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [5], and will not be backported to 4.11. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' ...no hits... So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD reconciliation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: https://issues.redhat.com/browse/OCPBUGS-575 [5]: openshift/operator-framework-olm#367
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec Jian Zhang also noticed the breakage in QE [4]. The breaking change was an OLM namespace touch: $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/release/artifacts/release-images-latest $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/release/artifacts/release-images-latest $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]' $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [5], and will not be backported to 4.11. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' ...no hits... So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD reconciliation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: https://issues.redhat.com/browse/OCPBUGS-575 [5]: openshift/operator-framework-olm#367
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec Jian Zhang also noticed the breakage in QE [4]. The breaking change was an OLM namespace touch: $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/release/artifacts/release-images-latest $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/release/artifacts/release-images-latest $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]' $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [5], and will not be backported to 4.11. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' ...no hits... So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD reconciliation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: https://issues.redhat.com/browse/OCPBUGS-575 [5]: openshift/operator-framework-olm#367
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec Jian Zhang also noticed the breakage in QE [4]. The breaking change was an OLM namespace touch: $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/release/artifacts/release-images-latest $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/release/artifacts/release-images-latest $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]' $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [5], and will not be backported to 4.11. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' ...no hits... So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD reconciliation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: https://issues.redhat.com/browse/OCPBUGS-575 [5]: openshift/operator-framework-olm#367
) Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.19.0 to 0.19.1. - [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/controller-runtime@v0.19.0...v0.19.1) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-runtime dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Upstream-repository: api Upstream-commit: 81b32ee0ca3337061b7db877f4e6ea44d0958886
This PR:
Adds the enforce:restricted Pod Security Admission labels to the
openshift-operator-lifecycle-manager namespace
Adds the enforce:privileged PSA labels to the openshift-operator
namespace, that will be removed in a future commit, when another entity
is present to modify the namespace to set the security of the namespace
according to the workloads present in the namespace.