Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Nov 13, 2020

"Registry" and "repository" are both generic names for portions of container image pullspecs. But graphDataImage is another pullspec in the UpdateService spec. Make the semantics more obvious, and remove the artifical spliting, by collapsing the two properties into a single releases, named after the meaning of the value instead of the syntax of the value.

I made the Go changes, and then grepped around for old-style references to update. Then I regenerated the CRD with:

$ controller-gen crd:trivialVersions=true rbac:roleName=updateservice-operator webhook paths="./..." output:crd:artifacts:config=config/crd/bases

See the previous commits, especially 3990db3008 (#83), for more on that `controller-gen` usage.

Kubebuilder config is from d9f361a (Migrate operator from v0.18.2
to v0.19.3, 2020-09-30, openshift#65) and ce9a447 (Rename operator Update
Service and change version to v1, 2020-10-04, openshift#65):

  $ git blame origin/master controllers/updateservice_controller.go | grep kubebuilder
  ce9a447 controllers/updateservice_controller.go            (Jack Ottofaro     2020-10-04 11:10:37 -0400 125) // +kubebuilder:rbac:groups="",namespace="updateservice-operator",resources=pods;services;services/finalizers;endpoints;persistentvolumeclaims;events;configmaps;secrets,verbs=create;delete;get;list;patch;update;watch
  ce9a447 controllers/updateservice_controller.go            (Jack Ottofaro     2020-10-04 11:10:37 -0400 126) // +kubebuilder:rbac:groups="apps",namespace="updateservice-operator",resources=deployments;daemonsets;replicasets;statefulsets,verbs=create;delete;get;list;patch;update;watch
  ce9a447 controllers/updateservice_controller.go            (Jack Ottofaro     2020-10-04 11:10:37 -0400 127) // +kubebuilder:rbac:groups="monitoring.coreos.com",namespace="updateservice-operator",resources=servicemonitors,verbs=create;get
  ce9a447 controllers/updateservice_controller.go            (Jack Ottofaro     2020-10-04 11:10:37 -0400 128) // +kubebuilder:rbac:groups="apps",namespace="updateservice-operator",resourceNames=updateservice-operator,resources=deployments/finalizers,verbs=update
  ce9a447 controllers/updateservice_controller.go            (Jack Ottofaro     2020-10-04 11:10:37 -0400 129) // +kubebuilder:rbac:groups="",namespace="updateservice-operator",resources=pods,verbs=get
  ce9a447 controllers/updateservice_controller.go            (Jack Ottofaro     2020-10-04 11:10:37 -0400 130) // +kubebuilder:rbac:groups="apps",namespace="updateservice-operator",resources=replicasets;deployments,verbs=get
  ce9a447 controllers/updateservice_controller.go            (Jack Ottofaro     2020-10-04 11:10:37 -0400 131) // +kubebuilder:rbac:groups="policy",namespace="updateservice-operator",resources=poddisruptionbudgets,verbs=create;delete;get;list;patch;update;watch
  ce9a447 controllers/updateservice_controller.go            (Jack Ottofaro     2020-10-04 11:10:37 -0400 132) // +kubebuilder:rbac:groups=updateservice.operator.openshift.io,namespace="updateservice-operator",resources=*,verbs=create;delete;get;list;patch;update;watch
  d9f361a controllers/cincinnati_controller.go               (Jack Ottofaro     2020-09-30 16:08:24 -0400 133) // +kubebuilder:rbac:groups=config.openshift.io,resources=images,verbs=get;list;watch
  d9f361a controllers/cincinnati_controller.go               (Jack Ottofaro     2020-09-30 16:08:24 -0400 134) // +kubebuilder:rbac:groups=route.openshift.io,resources=routes,verbs=create;get;list;patch;update;watch
  d9f361a controllers/cincinnati_controller.go               (Jack Ottofaro     2020-09-30 16:08:24 -0400 135) // +kubebuilder:rbac:groups="",resources=pods;services;services/finalizers;endpoints;persistentvolumeclaims;events;configmaps;secrets,verbs=create;delete;get;list;patch;update;watch
  d9f361a controllers/cincinnati_controller.go               (Jack Ottofaro     2020-09-30 16:08:24 -0400 136) // +kubebuilder:rbac:groups="apps",resources=deployments;daemonsets;replicasets;statefulsets,verbs=create;delete;get;list;patch;update;watch
  d9f361a controllers/cincinnati_controller.go               (Jack Ottofaro     2020-09-30 16:08:24 -0400 137) // +kubebuilder:rbac:groups="apps",resources=replicasets;deployments,verbs=get
  d9f361a controllers/cincinnati_controller.go               (Jack Ottofaro     2020-09-30 16:08:24 -0400 138) // +kubebuilder:rbac:groups="",resources=pods,verbs=get
  d9f361a controllers/cincinnati_controller.go               (Jack Ottofaro     2020-09-30 16:08:24 -0400 139) // +kubebuilder:rbac:groups="monitoring.coreos.com",resources=servicemonitors,verbs=create;get
  ce9a447 controllers/updateservice_controller.go            (Jack Ottofaro     2020-10-04 11:10:37 -0400 140) // +kubebuilder:rbac:groups="apps",resourceNames=updateservice-operator,resources=deployments/finalizers,verbs=update
  d9f361a controllers/cincinnati_controller.go               (Jack Ottofaro     2020-09-30 16:08:24 -0400 141) // +kubebuilder:rbac:groups="policy",resources=poddisruptionbudgets,verbs=create;delete;get;list;patch;update;watch
  ce9a447 controllers/updateservice_controller.go            (Jack Ottofaro     2020-10-04 11:10:37 -0400 142) // +kubebuilder:rbac:groups=updateservice.operator.openshift.io,resources=*,verbs=create;delete;get;list;patch;update;watch

But a bare SDK 1.0.1 scaffold contains no namespace properties in
those kubebuilder fields:

  $ operator-sdk-v1.0.1 init --domain openshift.io --repo github.com/openshift/cincinnati-operator
  $ operator-sdk-v1.0.1 create api --group updateservice.operator --version v1 --kind UpdateService --resource --controller
  $ grep -r kubebuilder:rbac controllers
  controllers/updateservice_controller.go:// +kubebuilder:rbac:groups=updateservice.operator.openshift.io,resources=updateservices,verbs=get;list;watch;create;update;patch;delete
  controllers/updateservice_controller.go:// +kubebuilder:rbac:groups=updateservice.operator.openshift.io,resources=updateservices/status,verbs=get;update;patch

The SDK scaffolding also tells us how to consume the kubebuilder
declarations:

  $ grep CRD_OPTIONS Makefile
  CRD_OPTIONS ?= "crd:trivialVersions=true"
    $(CONTROLLER_GEN) $(CRD_OPTIONS) rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases

This commit consolidates around that pattern with:

  $ emacs controllers/updateservice_controller.go # kubebuilder:rbac, drop namepace, sort alphabetically, remove duplicate lines
  $ controller-gen crd:trivialVersions=true rbac:roleName=updateservice-operator webhook paths="./..." output:crd:artifacts:config=config/crd/bases
  $ git rm config/crd/bases/updateservice.operator.openshift.io_updateservices_crd.yaml # remove the old name; I'm not sure how to generate with this name
  $ sed -i 's/updateservice.operator.openshift.io_updateservices_crd.yaml/updateservice.operator.openshift.io_updateservices.yaml/' $(git grep -l updateservice.operator.openshift.io_updateservices_crd.yaml)
  $ git add config controllers hack

using:

  $ controller-gen --version
  Version: v0.3.0

The rbac:roleName=updateservice-operator argument passed to controller-gen needs to match:

  $ git --no-pager blame config/manager/manager.yaml | grep serviceAccountName
  ce9a447 config/manager/manager.yaml (Jack Ottofaro     2020-10-04 11:10:37 -0400 15)       serviceAccountName: updateservice-operator

to avoid:

  Failed to get Pod{...} is forbidden: User "system:serviceaccount:openshift-updateservice:updateservice-operator" cannot get resource "pods" in API group "" in the namespace "openshift-updateservice": RBAC: [clusterrole.rbac.authorization.k8s.io "updateservice-operator" not found, role.rbac.authorization.k8s.io "updateservice-operator" not found
So users don't have to figure out the "${NAME}-policy-engine-route"
route naming on their own.

Generated with:

  $ emacs api controllers docs functests # manual changes
  $ go get github.com/openshift/library-go@093ad3cf66000cb994f8c8010da43a71ba147671
  go: github.com/openshift/library-go 093ad3cf66000cb994f8c8010da43a71ba147671 => v0.0.0-20201109112824-093ad3cf6600
  $ go mod tidy
  $ go mod vendor
  $ controller-gen crd:trivialVersions=true rbac:roleName=updateservice-operator webhook paths="./..." output:crd:artifacts:config=config/crd/bases
  $ git add -A api config controllers docs functests go.* vendor

using:

  $ go version
  go version go1.15.2 linux/amd64
  $ controller-gen --version
  Version: v0.3.0

The controller-gen command from the previous commit.
"Registry" and "repository" are both generic names for portions of
container image pullspecs.  But graphDataImage is another pullspec in
the UpdateService spec.  Make the semantics more obvious, and remove
the artifical spliting, by collapsing the two properties into a single
'releases', named after the meaning of the value instead of the syntax
of the value.

I made the Go changes, and then grepped around for old-style
references to update.  Then I regenerated the CRD with:

  $ controller-gen crd:trivialVersions=true rbac:roleName=updateservice-operator webhook paths="./..." output:crd:artifacts:config=config/crd/bases

See the previous commits, especially 3990db3 (controllers:
Consolidate kubebuilder declarations, 2020-11-12, openshift#83), for more on
that controller-gen usage.
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 13, 2020
@wking
Copy link
Member Author

wking commented Nov 13, 2020

images:

error: unable to parse image registry.build01.ci.openshift.org/ci-op-kl0yj6ti/stable@sha256:0443c407d0a63658687ae89b5bea9e6b7eb9a3bd665547d1bd913efec0ec5d8f: cannot retrieve image configuration for manifest sha256:0443c407d0a63658687ae89b5bea9e6b7eb9a3bd665547d1bd913efec0ec5d8f: received unexpected HTTP status: 500 Internal Server Error 

So that's a CI-build-infra flake, not an issue with the operator itself.

/retest

@wking
Copy link
Member Author

wking commented Nov 13, 2020

/test images
/test operator-e2e

@jottofar
Copy link
Contributor

/lgtm

@wking
Copy link
Member Author

wking commented Nov 16, 2020

/override ci/prow/operator-e2e

No need to test this again; it has passed two times before on the same commit, and the target master branch hasn't changed in the meantime.

@openshift-ci-robot
Copy link

@wking: Overrode contexts on behalf of wking: ci/prow/operator-e2e

Details

In response to this:

/override ci/prow/operator-e2e

No need to test this again on the same commit, it has passed two times before on the same commit, and the target master branch hasn't changed in the meantime.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 3a68312 into openshift:master Nov 16, 2020
@wking wking deleted the consolidate-release-image-repository branch November 16, 2020 22:27
kalexand-rh pushed a commit to wking/openshift-docs that referenced this pull request Feb 5, 2021
Mixing in some precedent from logging/, the Update Service blog post
[1], the GitHub docs [2], and some more recent operator CRD changes
like [3,4].

Kathryn didn't want us asking the user to poll [5], so I'm using
POSIX-shell 'while' loops to poll on the user's behalf.

[1]: https://www.openshift.com/blog/openshift-update-service-update-manager-for-your-cluster
[2]: https://github.com/openshift/cincinnati-operator/blob/2df239a8486d2ba3aa0d9925e5d505105ab36afe/docs/disconnected-cincinnati-operator.md
[3]: openshift/cincinnati-operator#66
[4]: openshift/cincinnati-operator#85
[5]: openshift#26219 (comment)
PratikMahajan pushed a commit to PratikMahajan/cincinnati-operator that referenced this pull request Mar 17, 2021
Baked in edges:

  $ oc adm release info quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 | grep Upgrades
    Upgrades: 4.2.13
  $ oc adm release info quay.io/openshift-release-dev/ocp-release:4.3.0-rc.3-x86_64 | grep Upgrades
    Upgrades: 4.2.16, 4.3.0-rc.0, 4.3.0-rc.1, 4.3.0-rc.2

The wide 'from' regexp was appropriate for 4.3.0-rc.0, which had no
4.3 update sources.  But rc.3 does have update sources, and we want to
allow 4.3.0-rc.0 -> 4.3.0-rc.3, because it is not impacted by the
4.2->4.3 GCP update bug.  The overly-strict regexp was from 6d3db09
(Blocking edges to candidate 4.3.0-rc.3, 2020-01-23, openshift#34).

Also expand the referenced bugs to for the blocked 4.2 -> 4.3 edges:

* Update hangs with [1]:

    Working towards 4.3.0...: 13% complete

  and machine-config going Degraded=True with RequiredPoolsFailed:

    Unable to apply 4.3.0-...: timed out waiting for the condition
    during syncRequiredMachineConfigPools: pool master has not
    progressed to latest configuration: controller version mismatch
    for rendered-master-6c22... expected 23a6... has d780... retrying

  Fixed in 4.2 with MCO 31fed93 [2] and in 4.2 with MCO 25bb6ae [3].

    $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.14 | grep machine-config
      machine-config-operator                       https://github.com/openshift/machine-config-operator                       d780d197a9c5848ba786982c0c4aaa7487297046
    $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.16 | grep machine-config
      machine-config-operator                       https://github.com/openshift/machine-config-operator                       31fed93186c9f84708f5cdfd0227ffe4f79b31cd

  So the 4.2 fix was in 4.2.16.

    $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-rc.0 | grep machine-config
      machine-config-operator                       https://github.com/openshift/machine-config-operator                       23a6e6fb37e73501bc3216183ef5e6ebb15efc7a
    $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-rc.3 | grep machine-config
      machine-config-operator                       https://github.com/openshift/machine-config-operator                       25bb6aeb58135c38a667e849edf5244871be4992

  So the 4.3 fix was new in rc.3.

* Updates hang with FailedCreatePodSandBox events in the
  openshift-ingress namespace like [4]:

    pod/router-default-...: Failed create pod sandbox: rpc error: code
    = Unknown desc = failed to create pod network sandbox
    k8s_router-default-..._openshift-ingress_...(...): Multus: error
    adding pod to network "openshift-sdn": delegateAdd: error invoking
    DelegateAdd - "openshift-sdn": error in getting result from
    AddNetwork: CNI request failed with status 400: 'failed to run
    IPAM for ...: failed to run CNI IPAM ADD: failed to allocate for
    range 0: no IP addresses available in range set: <ip1>-<ip2>

  Fixed in 4.2 with MCO 9366460 [5] and in 4.3 with MCO 311a01e [6].

    $ git --no-pager log --first-parent --oneline -4 origin/release-4.2
    6e0df82c (origin/release-4.2) Merge pull request #1347 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.2
    93664600 Merge pull request #1362 from rphillips/fixes/1787581_4.2
    bd358bb7 Merge pull request #1323 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.2
    31fed931 Merge pull request #1358 from runcom/osimageurl-race-42

  so the 4.2 fix was after 4.2.16's 31fed93186.

    $ git --no-pager log --first-parent --oneline -8 origin/release-4.3
    3ad3a836 (origin/release-4.3) Merge pull request #1399 from celebdor/haproxy-v4v6
    25503eee Merge pull request #1353 from russellb/1211-4.3-backport
    67ab306b Merge pull request #1426 from mandre/ssc43
    d74f56fe Merge pull request #1410 from retroflexer/manual-cherry-pick-from-master
    207cc171 Merge pull request #1406 from openshift-cherrypick-robot/cherry-pick-1396-to-release-4.3
    25bb6aeb Merge pull request #1359 from runcom/osimageurl-race-43
    311a01e8 Merge pull request #1361 from rphillips/fixes/1787581_4.3
    23a6e6fb Merge pull request #1348 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.3

  So the 4.3 fix was between rc.0's 23a6e6fb37 and rc.3's 25bb6aeb58
  (see 'release info' calls in the previous list entry for those
  commit hashes).

* Update CI fails with [7,8]:

    Could not reach HTTP service through <ip>:80 after 2m0s

  and authentication going Degraded=True with RouteHealthDegradedFailedGet:

    RouteHealthDegraded: failed to GET route: dial tcp <ip>:443:
    connect: connection refused

  Fixed in 4.2 with SDN 677b3a8 [9] and in 4.3 with SDN 74a8aee [10].

    $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.16 | grep ' node '
      node                                          https://github.com/openshift/sdn                                           770cb7bf922a721bc6c62af5490439d6174036fe
    $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.14 | grep ' node '
      node                                          https://github.com/openshift/sdn                                           770cb7bf922a721bc6c62af5490439d6174036fe
    $ git --no-pager log --first-parent --oneline -4 origin/release-4.2
    098a6410 (origin/release-4.2) Merge pull request openshift#95 from danwinship/fork-k8s-client-go-4.2
    9955a65b Merge pull request openshift#72 from juanluisvaladas/too_many_dns_queries_42
    677b3a80 Merge pull request openshift#90 from openshift-cherrypick-robot/cherry-pick-81-to-release-4.2
    770cb7bf Merge pull request openshift#73 from danwinship/egressip-cleanup-4.2

  So the fix landed after 4.2.16's 770cb7bf.

    $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-rc.0 | grep ' sdn '
      sdn                                           https://github.com/openshift/sdn                                           d4e36d5019ef0e130e0d246581508821a7322753
    $ git --no-pager log --first-parent --oneline -5 origin/release-4.3
    490a574e (origin/release-4.3) Merge pull request openshift#98 from openshift-cherrypick-robot/cherry-pick-96-to-release-4.3
    85ab1033 Merge pull request openshift#78 from openshift-cherrypick-robot/cherry-pick-57-to-release-4.3
    d4e36d50 Merge pull request openshift#85 from openshift-cherrypick-robot/cherry-pick-84-to-release-4.3
    dabc4ef5 Merge pull request openshift#83 from dougbtv/backport-build-use-host-local
    74a8aee3 Merge pull request openshift#81 from openshift-cherrypick-robot/cherry-pick-79-to-release-4.3

  So the fix landed before rc.0's d4e36d50.

* GCP update CI fails with [11]:

    Could not reach HTTP service through <ip>:80 after 2m0s

  in 4.2.16 -> 4.3.0-rc.0 [12], 4.2.16 -> 4.3.0-rc.3 [13,14,15], and
  4.2.18 -> 4.3.1 [16].  This doesn't happen every time though; at
  least one 4.2.16 -> 4.3.0-rc.3 has passed on GCP [17].  We don't
  have a root-cause yet, but the final failure matches [8] discussed
  above.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1786993
[2]: openshift/machine-config-operator#1358 (comment)
[3]: openshift/machine-config-operator#1359 (comment)
[4]: https://bugzilla.redhat.com/show_bug.cgi?id=1787635
[5]: openshift/machine-config-operator#1362 (comment)
[6]: openshift/machine-config-operator#1361 (comment)
[7]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/214#1:build-log.txt%3A414
[8]: https://bugzilla.redhat.com/show_bug.cgi?id=1781763
[9]: openshift/sdn#90 (comment)
[10]: openshift/sdn#81 (comment)
[11]: https://bugzilla.redhat.com/show_bug.cgi?id=1785457
[12]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/216
[13]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/232
[14]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/233
[15]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/234
[16]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/286
[17]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/230
PratikMahajan pushed a commit to PratikMahajan/cincinnati-operator that referenced this pull request Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants