-
Notifications
You must be signed in to change notification settings - Fork 212
OCPBUGS-575: lib/resourcemerge/core: Reconcile seccompProfile in ensurePodSecurityContext #830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-575: lib/resourcemerge/core: Reconcile seccompProfile in ensurePodSecurityContext #830
Conversation
|
@wking: This pull request references Jira Issue OCPBUGS-575, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…Context This recently started breaking 4.10-to-4.11-to-4.12 updates [1] with the first failing run [2] like: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-24T12:12:13Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-24T12:12:13Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-24-061508" image="registry.build03.ci.openshift.org/ci-op-2v4w1wbk/release@sha256:37a2aa2a46ed42eedf73cf0ee1ec0f9a37b822823518172c01ef6ad2ba3ffa12" architecture="amd64" 2022-08-24T12:31:04Z Available=True : Done applying 4.11.1 2022-08-24T14:07:56Z Failing=True WorkloadNotProgressing: deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-fbb7cb8b9-5zmwg" is forbidden: violates PodSecurity "restricted:v1.24": seccompProfile (pod or container "package-server-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 2022-08-24T13:40:48Z Progressing=True WorkloadNotProgressing: Unable to apply 4.12.0-0.ci-2022-08-24-061508: the workload openshift-operator-lifecycle-manager/package-server-manager cannot roll out 2022-08-24T12:36:55Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec while the previous run [3] had: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-08-22T12:12:30Z RetrievedUpdates=False NoChannel: The update channel has not been configured. 2022-08-22T12:12:30Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.ci-2022-08-22-055706" image="registry.build03.ci.openshift.org/ci-op-5wrvyzw8/release@sha256:42cb26b6290927a1857dfc246336739cebd297c51d9af43f586ceaff9073e825" architecture="amd64" 2022-08-22T12:36:17Z Available=True : Done applying 4.12.0-0.ci-2022-08-22-055706 2022-08-22T14:09:42Z Failing=False : 2022-08-22T14:44:58Z Progressing=False : Cluster version is 4.12.0-0.ci-2022-08-22-055706 2022-08-22T12:40:12Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec Jian Zhang also noticed the breakage in QE [4]. The breaking change was an OLM namespace touch: $ REF_A=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760/artifacts/release/artifacts/release-images-latest $ REF_B=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880/artifacts/release/artifacts/release-images-latest $ JQ='[.spec.tags[] | .name + " " + .annotations["io.openshift.build.source-location"] + "/commit/" + .annotations["io.openshift.build.commit.id"]] | sort[]' $ diff -U0 <(curl -s "${REF_A}" | jq -r "${JQ}") <(curl -s "${REF_B}" | jq -r "${JQ}") | grep olm -operator-lifecycle-manager openshift/operator-framework-olm@fd42910 -operator-registry openshift/operator-framework-olm@fd42910 +operator-lifecycle-manager openshift/operator-framework-olm@f8c466a +operator-registry openshift/operator-framework-olm@f8c466a $ git clone https://github.com/openshift/operator-framework-olm $ cd operator-framework-olm $ git --no-pager log --oneline fd429109b22e8..f8c466aeea67 | grep namespace be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns $ git --no-pager show --oneline be84d80d -- manifests be84d80d (psa) restrict olm namespace + remove labels from openshift-operators ns diff --git a/manifests/0000_50_olm_00-namespace.yaml b/manifests/0000_50_olm_00-namespace.yaml index 8fffa527..168e8867 100644 --- a/manifests/0000_50_olm_00-namespace.yaml +++ b/manifests/0000_50_olm_00-namespace.yaml @@ -3,6 +3,8 @@ kind: Namespace metadata: name: openshift-operator-lifecycle-manager labels: + pod-security.kubernetes.io/enforce: restricted + pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" openshift.io/cluster-monitoring: "true" annotations: @@ -16,7 +18,7 @@ kind: Namespace metadata: name: openshift-operators labels: - pod-security.kubernetes.io/enforce: baseline + pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: "v1.24" openshift.io/scc: "anyuid" annotations: which went out with [5], and will not be backported to 4.11. The relevant seccompProfile change landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change: $ git --no-pager grep seccompProfile: origin/release-4.11 -- manifests origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_06-psm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-collect-profiles.cronjob.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_07-olm-operator.deployment.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml: seccompProfile: origin/release-4.11:manifests/0000_50_olm_08-catalog-operator.deployment.yaml: seccompProfile: $ git --no-pager grep seccompProfile: origin/release-4.10 -- manifests origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: origin/release-4.10:manifests/0000_50_olm_00-clusterserviceversions.crd.yaml: seccompProfile: The property itself is from Kubernetes 1.19: $ git clone https://github.com/kubernetes/api $ cd api $ git blame origin/release-1.19 -- core/v1/types.go | grep '^type .* struct\|SeccompProfile.*seccompProfile' | grep -B1 SeccompProfile 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 3306) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,10,opt,name=seccompProfile"` 20b270679a (Paulo Gomes 2020-06-24 21:37:49 +0100 5979) SeccompProfile *SeccompProfile `json:"seccompProfile,omitempty" protobuf:"bytes,11,opt,name=seccompProfile"` $ git blame origin/release-1.18 -- core/v1/types.go | grep 'SeccompProfile.*seccompProfile' ...no hits... So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch: $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.60-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.56-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.48-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.47-x86_64 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.30-x86_64 $ grep -lr seccompProfile: 4.* 4.10/0000_50_cluster-monitoring-operator_00_0alertmanager-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0prometheus-custom-resource-definition.yaml 4.10/0000_50_cluster-monitoring-operator_00_0thanosruler-custom-resource-definition.yaml 4.10/0000_50_olm_00-clusterserviceversions.crd.yaml 4.7/0000_50_olm_00-clusterserviceversions.crd.yaml 4.8/0000_50_olm_00-clusterserviceversions.crd.yaml 4.9/0000_50_olm_00-clusterserviceversions.crd.yaml And we have sound CRD reconciliation. So at the moment, the only manifests asking for seccompProfile where we'd be worried about a lack of CVO reconciliation are 4.11 and later. In this commit, I'm adding seccompProfile reconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead of DeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates. [1]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing#periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1562409104817786880 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-from-stable-4.10-e2e-aws-sdn-upgrade/1561684150702837760 [4]: https://issues.redhat.com/browse/OCPBUGS-575 [5]: openshift/operator-framework-olm#367
237a2ca to
aae14d2
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jottofar, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade-into-change, ci/prow/e2e-agnostic-upgrade-out-of-change DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-575 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/cherrypick release-4.11 |
|
@wking: #830 failed to apply on top of branch "release-4.11": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This recently started breaking 4.10-to-4.11-to-4.12 updates with the first failing run like:
while the previous run had:
The breaking change was an OLM namespace touch:
which went out with openshift/operator-framework-olm#367, and will not be backported to 4.11.
The relevant
seccompProfilechange landed in 4.11 (bot not in 4.10), which is why born-in-4.11 clusters are not impacted by the change:The property itself is from Kubernetes 1.19:
So the fact that the CVO was not reconciling it would technically be a bug since OpenShift 4.6, which shipped Kubernetes 1.19. But auditing the most recent named release in each minor branch:
And we have sound CRD reconciliation. So at the moment, the only manifests asking for
seccompProfilewhere we'd be worried about a lack of CVO reconciliation are 4.11 and later.In this commit, I'm adding
seccompProfilereconciliation. There are quite a few other properties within (Pod)SecurityContext that the CVO is still ignoring, and I'm unclear on why we have been using per-property merging instead ofDeepEqual. But I'm going to punt on a broader move to DeepEqual for now, to make the narrow fix that should unstick 4.10-to-4.11-to-4.12 updates.