-
Notifications
You must be signed in to change notification settings - Fork 212
Bug 1951339: lib/resourcemerge/core: Clear env and envFrom if unset in manifest #549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1951339: lib/resourcemerge/core: Clear env and envFrom if unset in manifest #549
Conversation
See in a customer cluster [1]:
$ yaml2json <namespaces/openshift-marketplace/apps/deployments.yaml | jq -r '.items[].status.conditions'
[
{
"lastTransitionTime": "2021-04-12T22:04:41Z",
"lastUpdateTime": "2021-04-12T22:04:41Z",
"message": "Deployment has minimum availability.",
"reason": "MinimumReplicasAvailable",
"status": "True",
"type": "Available"
},
{
"lastTransitionTime": "2021-04-19T17:33:30Z",
"lastUpdateTime": "2021-04-19T17:33:30Z",
"message": "ReplicaSet \"marketplace-operator-f7cc88d59\" has timed out progressing.",
"reason": "ProgressDeadlineExceeded",
"status": "False",
"type": "Progressing"
}
]
Previous touch for this logic was 0442094
(lib/resourcebuilder/apps: Only error on Deployment Available=False
*and* Progressing=False, 2020-08-06, openshift#430), where I said:
Available=True, Progressing=False is the happy, steady state.
This commit tightens that back down a bit, to account for cases where
the Deployment controller claims the Deployment is still available,
but has given up on rolling out a requested update [2]. We definitely
want to complain about Deployments like that, because admins might
want to investigate the sticking issue, and possibly delete or
otherwise poke the Deployment so we come back in and reconcile it so
the Deployment controller gives it another attempt.
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1951339#c0
[2]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment
Even if the manifest authors state no opinions, these are not
properties that we want to allow cluster admins to manipulate. For
example, a customer cluster recently stuck a Deployment by inserting a
reference to a non-existent secret [1]:
$ yaml2json <namespaces/openshift-marketplace/apps/deployments.yaml | jq -r '.items[].spec.template.spec.containers[].envFrom[]'
{
"secretRef": {
"name": "openshift-reg"
}
}
$ yaml2json <namespaces/openshift-marketplace/pods/marketplace-operator-f7cc88d59-hhh75/marketplace-operator-f7cc88d59-hhh75.yaml | jq -r '.status.containerStatuses[].state'
{
"waiting": {
"message": "secret \"openshift-reg\" not found",
"reason": "CreateContainerConfigError"
}
}
The outgoing logic dates back to the beginning of reconciling these
properties in 14fab0b (add generic 2-way merge handler for random
types, 2018-09-27, openshift#26), and this commit's tightening follows on a
number of reconciliation tightenings like 29b92d2
(lib/resourcemerge/core: Clear livenessProbe and readinessProbe if nil
in required, 2020-01-16, openshift#298).
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1951339#c0
|
@wking: This pull request references Bugzilla bug 1951339, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jottofar, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@wking: All pull requests linked via external trackers have merged: Bugzilla bug 1951339 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Since this package was created in d9f6718 (lib: add lib for applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has required manifest entries to exist, but has allowed in-cluster entries to persist without removal. That hasn't been a problem until [1]: 1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on the cluster-autoscaler-operator-ca config map. 2. In 4.4, the autoscaler dropped those manifest entries [2]. 3. In 4.9, the autoscaler asked the CVO to remove the config map [3]. That lead some born-in 4.3 clusters to have crashlooping autoscalers, because the mount attempts kept failing on the missing config map. We couldn't think of a plausible reason why cluster admins would want to inject additional volume mounts in a CVO-managed pod configuration, so this commit removes that ability and begins clearing away any volume(mount) configuration that is not present in the reconciling manifest. Cluster administrators who do need to add additional mounts in an emergency are free to use ClusterVersion's spec.overrides to take control of a particular CVO-managed resource. This joins a series of similar previous tightenings, including 02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834 [2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89 [3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
Since this package was created in d9f6718 (lib: add lib for applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has required manifest entries to exist, but has allowed in-cluster entries to persist without removal. That hasn't been a problem until [1]: 1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on the cluster-autoscaler-operator-ca config map. 2. In 4.4, the autoscaler dropped those manifest entries [2]. 3. In 4.9, the autoscaler asked the CVO to remove the config map [3]. That lead some born-in 4.3 clusters to have crashlooping autoscalers, because the mount attempts kept failing on the missing config map. We couldn't think of a plausible reason why cluster admins would want to inject additional volume mounts in a CVO-managed pod configuration, so this commit removes that ability and begins clearing away any volume(mount) configuration that is not present in the reconciling manifest. Cluster administrators who do need to add additional mounts in an emergency are free to use ClusterVersion's spec.overrides to take control of a particular CVO-managed resource. This joins a series of similar previous tightenings, including 02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834 [2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89 [3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
Since this package was created in d9f6718 (lib: add lib for applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has required manifest entries to exist, but has allowed in-cluster entries to persist without removal. That hasn't been a problem until [1]: 1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on the cluster-autoscaler-operator-ca config map. 2. In 4.4, the autoscaler dropped those manifest entries [2]. 3. In 4.9, the autoscaler asked the CVO to remove the config map [3]. That lead some born-in 4.3 clusters to have crashlooping autoscalers, because the mount attempts kept failing on the missing config map. We couldn't think of a plausible reason why cluster admins would want to inject additional volume mounts in a CVO-managed pod configuration, so this commit removes that ability and begins clearing away any volume(mount) configuration that is not present in the reconciling manifest. Cluster administrators who do need to add additional mounts in an emergency are free to use ClusterVersion's spec.overrides to take control of a particular CVO-managed resource. This joins a series of similar previous tightenings, including 02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834 [2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89 [3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
Since this package was created in d9f6718 (lib: add lib for applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has required manifest entries to exist, but has allowed in-cluster entries to persist without removal. That hasn't been a problem until [1]: 1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on the cluster-autoscaler-operator-ca config map. 2. In 4.4, the autoscaler dropped those manifest entries [2]. 3. In 4.9, the autoscaler asked the CVO to remove the config map [3]. That lead some born-in 4.3 clusters to have crashlooping autoscalers, because the mount attempts kept failing on the missing config map. We couldn't think of a plausible reason why cluster admins would want to inject additional volume mounts in a CVO-managed pod configuration, so this commit removes that ability and begins clearing away any volume(mount) configuration that is not present in the reconciling manifest. Cluster administrators who do need to add additional mounts in an emergency are free to use ClusterVersion's spec.overrides to take control of a particular CVO-managed resource. This joins a series of similar previous tightenings, including 02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834 [2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89 [3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
Since this package was created in d9f6718 (lib: add lib for applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has required manifest entries to exist, but has allowed in-cluster entries to persist without removal. That hasn't been a problem until [1]: 1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on the cluster-autoscaler-operator-ca config map. 2. In 4.4, the autoscaler dropped those manifest entries [2]. 3. In 4.9, the autoscaler asked the CVO to remove the config map [3]. That lead some born-in 4.3 clusters to have crashlooping autoscalers, because the mount attempts kept failing on the missing config map. We couldn't think of a plausible reason why cluster admins would want to inject additional volume mounts in a CVO-managed pod configuration, so this commit removes that ability and begins clearing away any volume(mount) configuration that is not present in the reconciling manifest. Cluster administrators who do need to add additional mounts in an emergency are free to use ClusterVersion's spec.overrides to take control of a particular CVO-managed resource. This joins a series of similar previous tightenings, including 02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834 [2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89 [3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
Since this package was created in d9f6718 (lib: add lib for applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has required manifest entries to exist, but has allowed in-cluster entries to persist without removal. That hasn't been a problem until [1]: 1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on the cluster-autoscaler-operator-ca config map. 2. In 4.4, the autoscaler dropped those manifest entries [2]. 3. In 4.9, the autoscaler asked the CVO to remove the config map [3]. That lead some born-in 4.3 clusters to have crashlooping autoscalers, because the mount attempts kept failing on the missing config map. We couldn't think of a plausible reason why cluster admins would want to inject additional volume mounts in a CVO-managed pod configuration, so this commit removes that ability and begins clearing away any volume(mount) configuration that is not present in the reconciling manifest. Cluster administrators who do need to add additional mounts in an emergency are free to use ClusterVersion's spec.overrides to take control of a particular CVO-managed resource. This joins a series of similar previous tightenings, including 02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834 [2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89 [3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
Since this package was created in d9f6718 (lib: add lib for applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has required manifest entries to exist, but has allowed in-cluster entries to persist without removal. That hasn't been a problem until [1]: 1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on the cluster-autoscaler-operator-ca config map. 2. In 4.4, the autoscaler dropped those manifest entries [2]. 3. In 4.9, the autoscaler asked the CVO to remove the config map [3]. That lead some born-in 4.3 clusters to have crashlooping autoscalers, because the mount attempts kept failing on the missing config map. We couldn't think of a plausible reason why cluster admins would want to inject additional volume mounts in a CVO-managed pod configuration, so this commit removes that ability and begins clearing away any volume(mount) configuration that is not present in the reconciling manifest. Cluster administrators who do need to add additional mounts in an emergency are free to use ClusterVersion's spec.overrides to take control of a particular CVO-managed resource. This joins a series of similar previous tightenings, including 02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834 [2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89 [3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
to handle securityContext changes differently. Since d9f6718, if a securityContext is not explicitly specified in the manifest the resource's securityContext will remain unchanged and it will continue to use the securityContext setting of the currently running resource (if there is one). We're not sure of the exact reason the logic was originally developed in this manner but this change joins a series of similar previous tightenings, including openshift@02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and openshift@ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). Reconciliation of a PodSpec or Container securityContext changes has been changed such that the entire securityContext structure, or any sub field of it, will be cleared if not specified in the manifest. For example, prior to this change assume Deployment machine-api-operator is running on the cluster with the following: securityContext: runAsNonRoot: true runAsUser: 65534 and during an upgrade the Deployment machine-api-operator no longer specifies a securityContext. The resulting upgraded Deployment machine-api-operator will still have the original securityContext: securityContext: runAsNonRoot: true runAsUser: 65534 Similarly, there is no way to remove, or clear, a securityContext field such as runAsUser. You can only modify it. After this change the above scenario will correctly result in the Deployment machine-api-operator not specifying securityContext upon upgrade completion. The changes apply to both the SecurityContext within a Container and the PodSecurityContext within a PodSpec.
to handle securityContext changes differently. Since d9f6718, if a securityContext is not explicitly specified in the manifest the resource's securityContext will remain unchanged and it will continue to use the securityContext setting of the currently running resource (if there is one). We're not sure of the exact reason the logic was originally developed in this manner but this change joins a series of similar previous tightenings, including openshift@02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and openshift@ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). Reconciliation of a PodSpec or Container securityContext changes has been changed such that the entire securityContext structure, or any sub field of it, will be cleared if not specified in the manifest. For example, prior to this change assume Deployment machine-api-operator is running on the cluster with the following: securityContext: runAsNonRoot: true runAsUser: 65534 and during an upgrade the Deployment machine-api-operator no longer specifies a securityContext. The resulting upgraded Deployment machine-api-operator will still have the original securityContext: securityContext: runAsNonRoot: true runAsUser: 65534 Similarly, there is no way to remove, or clear, a securityContext field such as runAsUser. You can only modify it. After this change the above scenario will correctly result in the Deployment machine-api-operator not specifying securityContext upon upgrade completion. The changes apply to both the SecurityContext within a Container and the PodSecurityContext within a PodSpec.
to handle securityContext changes differently. Since d9f6718, if a securityContext is not explicitly specified in the manifest the resource's securityContext will remain unchanged and it will continue to use the securityContext setting of the currently running resource (if there is one). We're not sure of the exact reason the logic was originally developed in this manner but this change joins a series of similar previous tightenings, including openshift@02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and openshift@ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). Reconciliation of a PodSpec or Container securityContext changes in Kind Deployment, Job, and DaemonSet has been changed such that the entire securityContext structure, or any sub field of it, will be cleared if not specified in the manifest. Other aspects of Job are affected as well (e.g. ManualSelector and ActiveDeadlineSeconds). For example, prior to this change assume Deployment machine-api-operator is running on the cluster with the following: securityContext: runAsNonRoot: true runAsUser: 65534 and during an upgrade the Deployment machine-api-operator no longer specifies a securityContext. The resulting upgraded Deployment machine-api-operator will still have the original securityContext: securityContext: runAsNonRoot: true runAsUser: 65534 Similarly, there is no way to remove, or clear, a securityContext field such as runAsUser. You can only modify it. After this change the above scenario will correctly result in the Deployment machine-api-operator not specifying securityContext upon upgrade completion. The changes apply to both the SecurityContext within a Container and the PodSecurityContext within a PodSpec.
to handle securityContext changes differently. Since d9f6718, if a securityContext is not explicitly specified in the manifest the resource's securityContext will remain unchanged and it will continue to use the securityContext setting of the currently running resource (if there is one). We're not sure of the exact reason the logic was originally developed in this manner but this change joins a series of similar previous tightenings, including openshift@02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and openshift@ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). Reconciliation has been changed such that the entire securityContext structure, or any sub field of it, will be cleared if not specified in the manifest. This change affects Deployments, Jobs, and DaemonSets. It affects the securityContext found in both a PodSpec and a Container. Since the functions setInt64Ptr and setBoolPtr have been changed the impact is wide affecting ServiceAccounts, the PodSpec fields ShareProcessNamespace and TerminationGracePeriodSeconds, and the Job fields ActiveDeadlineSeconds and ManualSelector. For example, prior to this change assume Deployment machine-api-operator is running on the cluster with the following: securityContext: runAsNonRoot: true runAsUser: 65534 and during an upgrade the Deployment machine-api-operator no longer specifies a securityContext. The resulting upgraded Deployment machine-api-operator will still have the original securityContext: securityContext: runAsNonRoot: true runAsUser: 65534 Similarly, there is no way to remove, or clear, a securityContext field such as runAsUser. You can only modify it. After this change the above scenario will correctly result in the Deployment machine-api-operator not specifying securityContext upon upgrade completion. The changes apply to both the SecurityContext within a Container and the PodSecurityContext within a PodSpec.
to handle securityContext changes differently. Since d9f6718, if a securityContext is not explicitly specified in the manifest the resource's securityContext will remain unchanged and it will continue to use the securityContext setting of the currently running resource (if there is one). We're not sure of the exact reason the logic was originally developed in this manner but this change joins a series of similar previous tightenings, including openshift@02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and openshift@ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). Reconciliation has been changed such that the entire securityContext structure, or any sub field of it, will be cleared if not specified in the manifest. This change affects Deployments, Jobs, and DaemonSets. It affects the securityContext found in both a PodSpec and a Container. Since the functions setInt64Ptr and setBoolPtr have been changed the impact is wide affecting ServiceAccounts, the PodSpec fields ShareProcessNamespace and TerminationGracePeriodSeconds, and the Job fields ActiveDeadlineSeconds and ManualSelector. For example, prior to this change assume Deployment machine-api-operator is running on the cluster with the following: securityContext: runAsNonRoot: true runAsUser: 65534 and during an upgrade the Deployment machine-api-operator no longer specifies a securityContext. The resulting upgraded Deployment machine-api-operator will still have the original securityContext: securityContext: runAsNonRoot: true runAsUser: 65534 Similarly, there is no way to remove, or clear, a securityContext field such as runAsUser. You can only modify it. After this change the above scenario will correctly result in the Deployment machine-api-operator not specifying securityContext upon upgrade completion. The changes apply to both the SecurityContext within a Container and the PodSecurityContext within a PodSpec.
to handle securityContext changes differently. Since d9f6718, if a securityContext is not explicitly specified in the manifest the resource's securityContext will remain unchanged and it will continue to use the securityContext setting of the currently running resource (if there is one). We're not sure of the exact reason the logic was originally developed in this manner but this change joins a series of similar previous tightenings, including openshift@02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and openshift@ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). Reconciliation has been changed such that the entire securityContext structure, or any sub field of it, will be cleared if not specified in the manifest. This change affects Deployments, Jobs, and DaemonSets. It affects the securityContext found in both a PodSpec and a Container. Since the functions setInt64Ptr and setBoolPtr have been changed the impact is wide affecting ServiceAccounts, the PodSpec fields ShareProcessNamespace and TerminationGracePeriodSeconds, and the Job fields ActiveDeadlineSeconds and ManualSelector. For example, prior to this change assume Deployment machine-api-operator is running on the cluster with the following: securityContext: runAsNonRoot: true runAsUser: 65534 and during an upgrade the Deployment machine-api-operator no longer specifies a securityContext. The resulting upgraded Deployment machine-api-operator will still have the original securityContext: securityContext: runAsNonRoot: true runAsUser: 65534 Similarly, there is no way to remove, or clear, a securityContext field such as runAsUser. You can only modify it. After this change the above scenario will correctly result in the Deployment machine-api-operator not specifying securityContext upon upgrade completion. The changes apply to both the SecurityContext within a Container and the PodSecurityContext within a PodSpec.
to handle securityContext changes differently. Since d9f6718, if a securityContext is not explicitly specified in the manifest the resource's securityContext will remain unchanged and it will continue to use the securityContext setting of the currently running resource (if there is one). We're not sure of the exact reason the logic was originally developed in this manner but this change joins a series of similar previous tightenings, including openshift@02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and openshift@ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). Reconciliation has been changed such that the entire securityContext structure, or any sub field of it, will be cleared if not specified in the manifest. This change affects Deployments, Jobs, and DaemonSets. It affects the securityContext found in both a PodSpec and a Container. Since the functions setInt64Ptr and setBoolPtr have been changed the impact is wide affecting ServiceAccounts, the PodSpec fields ShareProcessNamespace and TerminationGracePeriodSeconds, and the Job fields ActiveDeadlineSeconds and ManualSelector. For example, prior to this change assume Deployment machine-api-operator is running on the cluster with the following: securityContext: runAsNonRoot: true runAsUser: 65534 and during an upgrade the Deployment machine-api-operator no longer specifies a securityContext. The resulting upgraded Deployment machine-api-operator will still have the original securityContext: securityContext: runAsNonRoot: true runAsUser: 65534 Similarly, there is no way to remove, or clear, a securityContext field such as runAsUser. You can only modify it. After this change the above scenario will correctly result in the Deployment machine-api-operator not specifying securityContext upon upgrade completion. The changes apply to both the SecurityContext within a Container and the PodSecurityContext within a PodSpec.
And also fail on Deployments that set
ProgressDeadlineExceeded. More background in the commit messages, but these two changes should address both points from rhbz#1951339.