Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Apr 20, 2021

And also fail on Deployments that set ProgressDeadlineExceeded. More background in the commit messages, but these two changes should address both points from rhbz#1951339.

wking added 2 commits April 20, 2021 10:32
See in a customer cluster [1]:

  $ yaml2json <namespaces/openshift-marketplace/apps/deployments.yaml | jq -r '.items[].status.conditions'
  [
    {
      "lastTransitionTime": "2021-04-12T22:04:41Z",
      "lastUpdateTime": "2021-04-12T22:04:41Z",
      "message": "Deployment has minimum availability.",
      "reason": "MinimumReplicasAvailable",
      "status": "True",
      "type": "Available"
    },
    {
      "lastTransitionTime": "2021-04-19T17:33:30Z",
      "lastUpdateTime": "2021-04-19T17:33:30Z",
      "message": "ReplicaSet \"marketplace-operator-f7cc88d59\" has timed out progressing.",
      "reason": "ProgressDeadlineExceeded",
      "status": "False",
      "type": "Progressing"
    }
  ]

Previous touch for this logic was 0442094
(lib/resourcebuilder/apps: Only error on Deployment Available=False
*and* Progressing=False, 2020-08-06, openshift#430), where I said:

  Available=True, Progressing=False is the happy, steady state.

This commit tightens that back down a bit, to account for cases where
the Deployment controller claims the Deployment is still available,
but has given up on rolling out a requested update [2].  We definitely
want to complain about Deployments like that, because admins might
want to investigate the sticking issue, and possibly delete or
otherwise poke the Deployment so we come back in and reconcile it so
the Deployment controller gives it another attempt.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1951339#c0
[2]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment
Even if the manifest authors state no opinions, these are not
properties that we want to allow cluster admins to manipulate.  For
example, a customer cluster recently stuck a Deployment by inserting a
reference to a non-existent secret [1]:

  $ yaml2json <namespaces/openshift-marketplace/apps/deployments.yaml | jq -r '.items[].spec.template.spec.containers[].envFrom[]'
  {
    "secretRef": {
      "name": "openshift-reg"
    }
  }
  $ yaml2json <namespaces/openshift-marketplace/pods/marketplace-operator-f7cc88d59-hhh75/marketplace-operator-f7cc88d59-hhh75.yaml | jq -r '.status.containerStatuses[].state'
  {
    "waiting": {
      "message": "secret \"openshift-reg\" not found",
      "reason": "CreateContainerConfigError"
    }
  }

The outgoing logic dates back to the beginning of reconciling these
properties in 14fab0b (add generic 2-way merge handler for random
types, 2018-09-27, openshift#26), and this commit's tightening follows on a
number of reconciliation tightenings like 29b92d2
(lib/resourcemerge/core: Clear livenessProbe and readinessProbe if nil
in required, 2020-01-16, openshift#298).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1951339#c0
@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Bugzilla bug 1951339, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jianlinliu

Details

In response to this:

Bug 1951339: lib/resourcemerge/core: Clear env and envFrom if unset in manifest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Apr 20, 2021
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 20, 2021
@jottofar
Copy link
Contributor

/retest

@jottofar
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 21, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit ed0250c into openshift:master Apr 22, 2021
@openshift-ci-robot
Copy link
Contributor

@wking: All pull requests linked via external trackers have merged:

Bugzilla bug 1951339 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1951339: lib/resourcemerge/core: Clear env and envFrom if unset in manifest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the always-reconcile-container-environment-variables branch April 23, 2021 23:44
wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 14, 2021
Since this package was created in d9f6718 (lib: add lib for
applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has
required manifest entries to exist, but has allowed in-cluster entries
to persist without removal.  That hasn't been a problem until [1]:

1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on
   the cluster-autoscaler-operator-ca config map.
2. In 4.4, the autoscaler dropped those manifest entries [2].
3. In 4.9, the autoscaler asked the CVO to remove the config map [3].

That lead some born-in 4.3 clusters to have crashlooping autoscalers,
because the mount attempts kept failing on the missing config map.

We couldn't think of a plausible reason why cluster admins would want
to inject additional volume mounts in a CVO-managed pod configuration,
so this commit removes that ability and begins clearing away any
volume(mount) configuration that is not present in the reconciling
manifest.  Cluster administrators who do need to add additional mounts
in an emergency are free to use ClusterVersion's spec.overrides to
take control of a particular CVO-managed resource.

This joins a series of similar previous tightenings, including
02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove
ports which are no longer required, 2020-02-13, openshift#322).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834
[2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89
[3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 14, 2021
Since this package was created in d9f6718 (lib: add lib for
applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has
required manifest entries to exist, but has allowed in-cluster entries
to persist without removal.  That hasn't been a problem until [1]:

1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on
   the cluster-autoscaler-operator-ca config map.
2. In 4.4, the autoscaler dropped those manifest entries [2].
3. In 4.9, the autoscaler asked the CVO to remove the config map [3].

That lead some born-in 4.3 clusters to have crashlooping autoscalers,
because the mount attempts kept failing on the missing config map.

We couldn't think of a plausible reason why cluster admins would want
to inject additional volume mounts in a CVO-managed pod configuration,
so this commit removes that ability and begins clearing away any
volume(mount) configuration that is not present in the reconciling
manifest.  Cluster administrators who do need to add additional mounts
in an emergency are free to use ClusterVersion's spec.overrides to
take control of a particular CVO-managed resource.

This joins a series of similar previous tightenings, including
02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove
ports which are no longer required, 2020-02-13, openshift#322).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834
[2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89
[3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 14, 2021
Since this package was created in d9f6718 (lib: add lib for
applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has
required manifest entries to exist, but has allowed in-cluster entries
to persist without removal.  That hasn't been a problem until [1]:

1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on
   the cluster-autoscaler-operator-ca config map.
2. In 4.4, the autoscaler dropped those manifest entries [2].
3. In 4.9, the autoscaler asked the CVO to remove the config map [3].

That lead some born-in 4.3 clusters to have crashlooping autoscalers,
because the mount attempts kept failing on the missing config map.

We couldn't think of a plausible reason why cluster admins would want
to inject additional volume mounts in a CVO-managed pod configuration,
so this commit removes that ability and begins clearing away any
volume(mount) configuration that is not present in the reconciling
manifest.  Cluster administrators who do need to add additional mounts
in an emergency are free to use ClusterVersion's spec.overrides to
take control of a particular CVO-managed resource.

This joins a series of similar previous tightenings, including
02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove
ports which are no longer required, 2020-02-13, openshift#322).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834
[2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89
[3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 14, 2021
Since this package was created in d9f6718 (lib: add lib for
applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has
required manifest entries to exist, but has allowed in-cluster entries
to persist without removal.  That hasn't been a problem until [1]:

1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on
   the cluster-autoscaler-operator-ca config map.
2. In 4.4, the autoscaler dropped those manifest entries [2].
3. In 4.9, the autoscaler asked the CVO to remove the config map [3].

That lead some born-in 4.3 clusters to have crashlooping autoscalers,
because the mount attempts kept failing on the missing config map.

We couldn't think of a plausible reason why cluster admins would want
to inject additional volume mounts in a CVO-managed pod configuration,
so this commit removes that ability and begins clearing away any
volume(mount) configuration that is not present in the reconciling
manifest.  Cluster administrators who do need to add additional mounts
in an emergency are free to use ClusterVersion's spec.overrides to
take control of a particular CVO-managed resource.

This joins a series of similar previous tightenings, including
02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove
ports which are no longer required, 2020-02-13, openshift#322).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834
[2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89
[3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 14, 2021
Since this package was created in d9f6718 (lib: add lib for
applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has
required manifest entries to exist, but has allowed in-cluster entries
to persist without removal.  That hasn't been a problem until [1]:

1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on
   the cluster-autoscaler-operator-ca config map.
2. In 4.4, the autoscaler dropped those manifest entries [2].
3. In 4.9, the autoscaler asked the CVO to remove the config map [3].

That lead some born-in 4.3 clusters to have crashlooping autoscalers,
because the mount attempts kept failing on the missing config map.

We couldn't think of a plausible reason why cluster admins would want
to inject additional volume mounts in a CVO-managed pod configuration,
so this commit removes that ability and begins clearing away any
volume(mount) configuration that is not present in the reconciling
manifest.  Cluster administrators who do need to add additional mounts
in an emergency are free to use ClusterVersion's spec.overrides to
take control of a particular CVO-managed resource.

This joins a series of similar previous tightenings, including
02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove
ports which are no longer required, 2020-02-13, openshift#322).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834
[2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89
[3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 14, 2021
Since this package was created in d9f6718 (lib: add lib for
applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has
required manifest entries to exist, but has allowed in-cluster entries
to persist without removal.  That hasn't been a problem until [1]:

1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on
   the cluster-autoscaler-operator-ca config map.
2. In 4.4, the autoscaler dropped those manifest entries [2].
3. In 4.9, the autoscaler asked the CVO to remove the config map [3].

That lead some born-in 4.3 clusters to have crashlooping autoscalers,
because the mount attempts kept failing on the missing config map.

We couldn't think of a plausible reason why cluster admins would want
to inject additional volume mounts in a CVO-managed pod configuration,
so this commit removes that ability and begins clearing away any
volume(mount) configuration that is not present in the reconciling
manifest.  Cluster administrators who do need to add additional mounts
in an emergency are free to use ClusterVersion's spec.overrides to
take control of a particular CVO-managed resource.

This joins a series of similar previous tightenings, including
02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove
ports which are no longer required, 2020-02-13, openshift#322).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834
[2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89
[3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Sep 16, 2021
Since this package was created in d9f6718 (lib: add lib for
applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has
required manifest entries to exist, but has allowed in-cluster entries
to persist without removal.  That hasn't been a problem until [1]:

1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on
   the cluster-autoscaler-operator-ca config map.
2. In 4.4, the autoscaler dropped those manifest entries [2].
3. In 4.9, the autoscaler asked the CVO to remove the config map [3].

That lead some born-in 4.3 clusters to have crashlooping autoscalers,
because the mount attempts kept failing on the missing config map.

We couldn't think of a plausible reason why cluster admins would want
to inject additional volume mounts in a CVO-managed pod configuration,
so this commit removes that ability and begins clearing away any
volume(mount) configuration that is not present in the reconciling
manifest.  Cluster administrators who do need to add additional mounts
in an emergency are free to use ClusterVersion's spec.overrides to
take control of a particular CVO-managed resource.

This joins a series of similar previous tightenings, including
02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove
ports which are no longer required, 2020-02-13, openshift#322).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834
[2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89
[3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
jottofar added a commit to jottofar/cluster-version-operator that referenced this pull request Jul 25, 2022
to handle securityContext changes differently. Since d9f6718, if a
securityContext is not explicitly specified in the manifest the
resource's securityContext will remain unchanged and it will
continue to use the securityContext setting of the currently running
resource (if there is one). We're not sure of the exact reason the
logic was originally developed in this manner but this change joins
a series of similar previous tightenings, including
openshift@02bb9ba
(lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and
openshift@ca299b8
(lib/resourcemerge: remove ports which are no longer required,
2020-02-13, openshift#322).

Reconciliation of a PodSpec or Container securityContext changes
has been changed such that the entire securityContext structure,
or any sub field of it, will be cleared if not specified in the manifest.

For example, prior to this change assume Deployment machine-api-operator
is running on the cluster with the following:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

and during an upgrade the Deployment machine-api-operator no longer
specifies a securityContext. The resulting upgraded Deployment
machine-api-operator will still have the original securityContext:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

Similarly, there is no way to remove, or clear, a securityContext
field such as runAsUser. You can only modify it.

After this change the above scenario will correctly result in the
Deployment machine-api-operator not specifying securityContext
upon upgrade completion.

The changes apply to both the SecurityContext within a Container
and the PodSecurityContext within a PodSpec.
jottofar added a commit to jottofar/cluster-version-operator that referenced this pull request Jul 25, 2022
to handle securityContext changes differently. Since d9f6718, if a
securityContext is not explicitly specified in the manifest the
resource's securityContext will remain unchanged and it will
continue to use the securityContext setting of the currently running
resource (if there is one). We're not sure of the exact reason the
logic was originally developed in this manner but this change joins
a series of similar previous tightenings, including
openshift@02bb9ba
(lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and
openshift@ca299b8
(lib/resourcemerge: remove ports which are no longer required,
2020-02-13, openshift#322).

Reconciliation of a PodSpec or Container securityContext changes
has been changed such that the entire securityContext structure,
or any sub field of it, will be cleared if not specified in the manifest.

For example, prior to this change assume Deployment machine-api-operator
is running on the cluster with the following:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

and during an upgrade the Deployment machine-api-operator no longer
specifies a securityContext. The resulting upgraded Deployment
machine-api-operator will still have the original securityContext:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

Similarly, there is no way to remove, or clear, a securityContext
field such as runAsUser. You can only modify it.

After this change the above scenario will correctly result in the
Deployment machine-api-operator not specifying securityContext
upon upgrade completion.

The changes apply to both the SecurityContext within a Container
and the PodSecurityContext within a PodSpec.
jottofar added a commit to jottofar/cluster-version-operator that referenced this pull request Jul 25, 2022
to handle securityContext changes differently. Since d9f6718, if a
securityContext is not explicitly specified in the manifest the
resource's securityContext will remain unchanged and it will
continue to use the securityContext setting of the currently running
resource (if there is one). We're not sure of the exact reason the
logic was originally developed in this manner but this change joins
a series of similar previous tightenings, including
openshift@02bb9ba
(lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and
openshift@ca299b8
(lib/resourcemerge: remove ports which are no longer required,
2020-02-13, openshift#322).

Reconciliation of a PodSpec or Container securityContext changes
in Kind Deployment, Job, and DaemonSet has been changed such that
the entire securityContext structure, or any sub field of it, will
be cleared if not specified in the manifest. Other aspects of Job
are affected as well (e.g. ManualSelector and ActiveDeadlineSeconds).

For example, prior to this change assume Deployment machine-api-operator
is running on the cluster with the following:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

and during an upgrade the Deployment machine-api-operator no longer
specifies a securityContext. The resulting upgraded Deployment
machine-api-operator will still have the original securityContext:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

Similarly, there is no way to remove, or clear, a securityContext
field such as runAsUser. You can only modify it.

After this change the above scenario will correctly result in the
Deployment machine-api-operator not specifying securityContext
upon upgrade completion.

The changes apply to both the SecurityContext within a Container
and the PodSecurityContext within a PodSpec.
jottofar added a commit to jottofar/cluster-version-operator that referenced this pull request Jul 26, 2022
to handle securityContext changes differently. Since d9f6718, if a
securityContext is not explicitly specified in the manifest the
resource's securityContext will remain unchanged and it will
continue to use the securityContext setting of the currently running
resource (if there is one). We're not sure of the exact reason the
logic was originally developed in this manner but this change joins
a series of similar previous tightenings, including
openshift@02bb9ba
(lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and
openshift@ca299b8
(lib/resourcemerge: remove ports which are no longer required,
2020-02-13, openshift#322).

Reconciliation has been changed such that the entire securityContext
structure, or any sub field of it, will be cleared if not specified
in the manifest. This change affects Deployments, Jobs, and
DaemonSets. It affects the securityContext found in both a PodSpec
and a Container. Since the functions setInt64Ptr and setBoolPtr
have been changed the impact is wide affecting ServiceAccounts, the
PodSpec fields ShareProcessNamespace and
TerminationGracePeriodSeconds, and the Job fields
ActiveDeadlineSeconds and ManualSelector.

For example, prior to this change assume Deployment machine-api-operator
is running on the cluster with the following:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

and during an upgrade the Deployment machine-api-operator no longer
specifies a securityContext. The resulting upgraded Deployment
machine-api-operator will still have the original securityContext:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

Similarly, there is no way to remove, or clear, a securityContext
field such as runAsUser. You can only modify it.

After this change the above scenario will correctly result in the
Deployment machine-api-operator not specifying securityContext
upon upgrade completion.

The changes apply to both the SecurityContext within a Container
and the PodSecurityContext within a PodSpec.
jottofar added a commit to jottofar/cluster-version-operator that referenced this pull request Jul 26, 2022
to handle securityContext changes differently. Since d9f6718, if a
securityContext is not explicitly specified in the manifest the
resource's securityContext will remain unchanged and it will
continue to use the securityContext setting of the currently running
resource (if there is one). We're not sure of the exact reason the
logic was originally developed in this manner but this change joins
a series of similar previous tightenings, including
openshift@02bb9ba
(lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and
openshift@ca299b8
(lib/resourcemerge: remove ports which are no longer required,
2020-02-13, openshift#322).

Reconciliation has been changed such that the entire securityContext
structure, or any sub field of it, will be cleared if not specified
in the manifest. This change affects Deployments, Jobs, and
DaemonSets. It affects the securityContext found in both a PodSpec
and a Container. Since the functions setInt64Ptr and setBoolPtr
have been changed the impact is wide affecting ServiceAccounts, the
PodSpec fields ShareProcessNamespace and
TerminationGracePeriodSeconds, and the Job fields
ActiveDeadlineSeconds and ManualSelector.

For example, prior to this change assume Deployment machine-api-operator
is running on the cluster with the following:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

and during an upgrade the Deployment machine-api-operator no longer
specifies a securityContext. The resulting upgraded Deployment
machine-api-operator will still have the original securityContext:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

Similarly, there is no way to remove, or clear, a securityContext
field such as runAsUser. You can only modify it.

After this change the above scenario will correctly result in the
Deployment machine-api-operator not specifying securityContext
upon upgrade completion.

The changes apply to both the SecurityContext within a Container
and the PodSecurityContext within a PodSpec.
petr-muller pushed a commit to petr-muller/cluster-version-operator that referenced this pull request Jun 30, 2023
to handle securityContext changes differently. Since d9f6718, if a
securityContext is not explicitly specified in the manifest the
resource's securityContext will remain unchanged and it will
continue to use the securityContext setting of the currently running
resource (if there is one). We're not sure of the exact reason the
logic was originally developed in this manner but this change joins
a series of similar previous tightenings, including
openshift@02bb9ba
(lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and
openshift@ca299b8
(lib/resourcemerge: remove ports which are no longer required,
2020-02-13, openshift#322).

Reconciliation has been changed such that the entire securityContext
structure, or any sub field of it, will be cleared if not specified
in the manifest. This change affects Deployments, Jobs, and
DaemonSets. It affects the securityContext found in both a PodSpec
and a Container. Since the functions setInt64Ptr and setBoolPtr
have been changed the impact is wide affecting ServiceAccounts, the
PodSpec fields ShareProcessNamespace and
TerminationGracePeriodSeconds, and the Job fields
ActiveDeadlineSeconds and ManualSelector.

For example, prior to this change assume Deployment machine-api-operator
is running on the cluster with the following:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

and during an upgrade the Deployment machine-api-operator no longer
specifies a securityContext. The resulting upgraded Deployment
machine-api-operator will still have the original securityContext:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

Similarly, there is no way to remove, or clear, a securityContext
field such as runAsUser. You can only modify it.

After this change the above scenario will correctly result in the
Deployment machine-api-operator not specifying securityContext
upon upgrade completion.

The changes apply to both the SecurityContext within a Container
and the PodSecurityContext within a PodSpec.
petr-muller pushed a commit to petr-muller/cluster-version-operator that referenced this pull request Jul 10, 2023
to handle securityContext changes differently. Since d9f6718, if a
securityContext is not explicitly specified in the manifest the
resource's securityContext will remain unchanged and it will
continue to use the securityContext setting of the currently running
resource (if there is one). We're not sure of the exact reason the
logic was originally developed in this manner but this change joins
a series of similar previous tightenings, including
openshift@02bb9ba
(lib/resourcemerge/core: Clear env and envFrom if unset in
manifest, 2021-04-20, openshift#549) and
openshift@ca299b8
(lib/resourcemerge: remove ports which are no longer required,
2020-02-13, openshift#322).

Reconciliation has been changed such that the entire securityContext
structure, or any sub field of it, will be cleared if not specified
in the manifest. This change affects Deployments, Jobs, and
DaemonSets. It affects the securityContext found in both a PodSpec
and a Container. Since the functions setInt64Ptr and setBoolPtr
have been changed the impact is wide affecting ServiceAccounts, the
PodSpec fields ShareProcessNamespace and
TerminationGracePeriodSeconds, and the Job fields
ActiveDeadlineSeconds and ManualSelector.

For example, prior to this change assume Deployment machine-api-operator
is running on the cluster with the following:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

and during an upgrade the Deployment machine-api-operator no longer
specifies a securityContext. The resulting upgraded Deployment
machine-api-operator will still have the original securityContext:

securityContext:
    runAsNonRoot: true
    runAsUser: 65534

Similarly, there is no way to remove, or clear, a securityContext
field such as runAsUser. You can only modify it.

After this change the above scenario will correctly result in the
Deployment machine-api-operator not specifying securityContext
upon upgrade completion.

The changes apply to both the SecurityContext within a Container
and the PodSecurityContext within a PodSpec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants