add rc.1 in candidate-4.4.yaml #118

sosiouxme · 2020-03-13T20:43:28Z

No description provided.

wking · 2020-03-13T20:50:55Z

No need for a new 4.3 release, because we already have 4.3.5 in this channel, and:

$ oc adm release info quay.io/openshift-release-dev/ocp-release:4.4.0-rc.1-x86_64 | grep Upgrades
  Upgrades: 4.3.5, 4.4.0-rc.0

Single AWS rc.0 -> rc.1 update passed. Looking at the 4.3.5 -> rc.1 jobs, three AWS jobs passed, one failed on setup (throttling), and one had short (<2m) unreachable during disruption issues. None of those are candidate-promotion blockers.

I'll launch some CI jobs on GCP and Azure...

wking · 2020-03-14T03:53:21Z

New results:

4.3.5 -> 4.4.0-rc.1 GCP failed with some unreachable during disruption, all less that 4m, which we don't consider edge-blocking.
4.3.5 -> 4.4.0-rc.1 GCP failed with Cluster did not complete upgrade: timed out waiting for the condition: Working towards 4.4.0-rc.1: 12% complete. Needs more investigation.
4.3.5 -> 4.4.0-rc.1 Azure failed with failed to acquire lease: status 503 Service Unavailable, which is pre-update, so no impact on edge stability.
4.3.5 -> 4.4.0-rc.1 Azure failed with Cluster did not complete upgrade: timed out waiting for the condition: Working towards 4.4.0-rc.1: 18% complete. Needs more investigation.
4.4.0-rc.0 -> 4.4.0-rc.1 GCP failed with failed to acquire lease: status 503 Service Unavailable.
4.4.0-rc.0 -> 4.4.0-rc.1 Azure had some unreachable during disruption, all less than 2m, and was counted as a success.

Launching replacements for the two Boskos 503s...

wking · 2020-03-14T04:01:28Z

The 4.3.5 -> 4.4.0-rc.1 GCP timeout job has:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/443/artifacts/e2e-gcp-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-f1ffcdcbd684afd61ff1874b47e1c61a0f7adab93b7a21123a1e29b041d3dabf/namespaces/openshift-cluster-version/pods/cluster-version-operator-5fffc549d9-shbf9/cluster-version-operator/cluster-version-operator/logs/current.log | grep 'Running sync.*in state\|Result of work' | tail -n4
2020-03-13T22:46:45.34504345Z I0313 22:46:45.345010       1 task_graph.go:596] Result of work: [Cluster operator etcd is reporting a failure: EtcdMemberIPMigratorDegraded: etcdserver: Peer URLs already exists]
2020-03-13T22:49:55.347779833Z I0313 22:49:55.347709       1 sync_worker.go:471] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:6fa3e6520d6668737d29a68ef7d7189642b07dba9b17511316210f336e9492b0 (force=true) on generation 2 in state Updating at attempt 8
2020-03-13T22:55:40.40044101Z I0313 22:55:40.399557       1 task_graph.go:596] Result of work: [Cluster operator etcd is reporting a failure: EtcdMemberIPMigratorDegraded: etcdserver: Peer URLs already exists]
2020-03-13T22:58:57.58632797Z I0313 22:58:57.586215       1 sync_worker.go:471] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:6fa3e6520d6668737d29a68ef7d7189642b07dba9b17511316210f336e9492b0 (force=true) on generation 2 in state Updating at attempt 9

The Azure job has:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade/92/artifacts/e2e-azure-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-69d0187ff513f67a3dbe37c843e318a59ff27699f3c38dcd2d79df13bc176def/namespaces/openshift-cluster-version/pods/cluster-version-operator-5fffc549d9-kmq5k/cluster-version-operator/cluster-version-operator/logs/current.log | grep 'Running sync.*in state\|Result of work' | tail -n3
2020-03-13T23:01:42.3198065Z I0313 23:01:42.319768       1 task_graph.go:596] Result of work: [Cluster operator kube-controller-manager is reporting a failure: NodeInstallerDegraded: 1 nodes are failing on revision 346:
2020-03-13T23:04:40.0548178Z I0313 23:04:40.054748       1 sync_worker.go:471] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:6fa3e6520d6668737d29a68ef7d7189642b07dba9b17511316210f336e9492b0 (force=true) on generation 2 in state Updating at attempt 9
2020-03-13T23:10:25.1064177Z I0313 23:10:25.106409       1 task_graph.go:596] Result of work: [Cluster operator kube-controller-manager is reporting a failure: NodeInstallerDegraded: 1 nodes are failing on revision 393:

I'll hunt around for 4.3 -> 4.4 bugs mentioning EtcdMemberIPMigratorDegraded from etcd or NodeInstallerDegraded from kube-controller-manager.

wking · 2020-03-14T04:14:06Z

EtcdMemberIPMigratorDegraded: etcdserver: Peer URLs already exists is rhbz#1811706. I've asked the etcd team for an impact statement.

wking · 2020-03-14T05:00:03Z

Created rhbz#1813512 for the NodeInstallerDegraded.

wking · 2020-03-14T05:12:00Z

4.3.5 -> 4.4.0-rc.1 Azure died in setup with ReferencedResourceNotProvisioned, rhbz#1813513. I've launced a replacement.

wking · 2020-03-14T06:09:51Z

4.4.0-rc.0 -> 4.4.0-rc.1 GCP had some unreachable during disruption, all less than 2m, and was counted as a success.

wking · 2020-03-14T08:06:40Z

4.3.5 -> 4.4.0-rc.1 Azure failed with Cluster did not complete upgrade: timed out waiting for the condition: Working towards 4.4.0-rc.1: 12% complete with the EtcdMemberIPMigratorDegraded: etcdserver: Peer URLs already exists issue from rhbz#1811706. We may want to get that triaged before we add rc.1 to candidate-4.4. Or block 4.3 -> 4.4.0-rc.1?

wking · 2020-03-16T20:02:50Z

My EtcdMemberIPMigratorDegraded bug was closed as a dup of rhbz#1812584, which is VERIFIED today. So I'd guess the next RC will have the fix. I'm agnostic about whether we pull 4.3 -> 4.4 edges for the current 4.4 RCs from candidate-4.4 or not.

wking · 2020-03-17T04:46:36Z

I haven't mentioned the NodeInstallerDegraded job in the past few comments, but preliminary noises from @tnozicka make it sound like a potential upgrade blocker as well. Still not clear on whether it's common enough to call for excluding rc.1 from candidate-4.4 .

eparis · 2020-03-17T22:43:56Z

What do we think about merging this, then pull edges when the next rc comes along with a note to 1812584?

sdodson · 2020-03-18T12:27:36Z

What do we think about merging this, then pull edges when the next rc comes along with a note to 1812584?

That's fine, I was hoping that there'd be a new RC first thing this morning but there isn't.

LalatenduMohanty · 2020-03-18T12:45:40Z

Right, we should be fine to merge this as we know what are the issues around this RC.
/lgtm

openshift-ci-robot · 2020-03-18T12:45:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LalatenduMohanty, sosiouxme
To complete the pull request process, please assign smarterclayton
You can assign the PR to them by writing /assign @smarterclayton in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

channels/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

LalatenduMohanty · 2020-03-18T12:48:57Z

/hold as I want to put a PR for blocking the edges to 4.4 first

These are the UpgradeBlockers

4.3 -> 4.4 update hangs on ClusterOperatorDegraded, NodeInstallerDegraded, RevisionControllerDegraded, TargetConfigControllerDegraded -> https://bugzilla.redhat.com/show_bug.cgi?id=1812584 (This is in verified state)
cluster-etcd-operator should not scale when upgrading from 4.3 to 4.4 -> https://bugzilla.redhat.com/show_bug.cgi?id=1813512

LalatenduMohanty · 2020-03-18T13:09:17Z

Created #123

Once we merge #123 we can remove the hold and merge this PR

wking · 2020-03-18T16:26:00Z

/hold

rc.1 may be impacted by #125. Let's keep it out of channels until we know for sure. See also the bugs linked from #123.

wking · 2020-03-18T17:37:20Z

Tombstoned in #127.

/close

openshift-ci-robot · 2020-03-18T17:37:30Z

@wking: Closed this PR.

Details

In response to this:

Tombstoned in #127.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

LalatenduMohanty · 2020-03-19T18:12:45Z

/reopen
As we do not hold the PR in candidates because of upgrade blockers.

openshift-ci-robot · 2020-03-19T18:13:01Z

@LalatenduMohanty: Reopened this PR.

Details

In response to this:

/reopen
As we do not hold the PR in candidates because of upgrade blockers.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

LalatenduMohanty · 2020-03-19T18:15:15Z

@sosiouxme Can you reopen the PR please?

wking · 2020-03-19T18:21:17Z

/close

Getting handled in #127 (although that's now adding to candidate, but with different motivation)

openshift-ci-robot · 2020-03-19T18:21:33Z

@wking: Closed this PR.

Details

In response to this:

/close

Getting handled in #127 (although that's now adding to candidate, but with different motivation)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

add rc.1 in candidate-4.4.yaml

21dac22

openshift-ci-robot requested review from vrutkovs and wking March 13, 2020 20:45

openshift-ci-robot assigned LalatenduMohanty Mar 18, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 18, 2020

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2020

wking mentioned this pull request Mar 18, 2020

Block edges to 4.4.0-rc.1 from 4.3.5 #123

Closed

openshift-ci-robot closed this Mar 18, 2020

LalatenduMohanty mentioned this pull request Mar 19, 2020

add 4.4.0-rc.1 to candidate #128

Closed

openshift-ci-robot reopened this Mar 19, 2020

openshift-ci-robot closed this Mar 19, 2020

add rc.1 in candidate-4.4.yaml #118

add rc.1 in candidate-4.4.yaml #118

Uh oh!

Conversation

sosiouxme commented Mar 13, 2020

Uh oh!

wking commented Mar 13, 2020

Uh oh!

wking commented Mar 14, 2020

Uh oh!

wking commented Mar 14, 2020

Uh oh!

wking commented Mar 14, 2020

Uh oh!

wking commented Mar 14, 2020

Uh oh!

wking commented Mar 14, 2020

Uh oh!

wking commented Mar 14, 2020

Uh oh!

wking commented Mar 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Mar 16, 2020

Uh oh!

wking commented Mar 17, 2020

Uh oh!

eparis commented Mar 17, 2020

Uh oh!

sdodson commented Mar 18, 2020

Uh oh!

LalatenduMohanty commented Mar 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Mar 18, 2020

Uh oh!

LalatenduMohanty commented Mar 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LalatenduMohanty commented Mar 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Mar 18, 2020

Uh oh!

wking commented Mar 18, 2020

Uh oh!

openshift-ci-robot commented Mar 18, 2020

Uh oh!

LalatenduMohanty commented Mar 19, 2020

Uh oh!

openshift-ci-robot commented Mar 19, 2020

Uh oh!

LalatenduMohanty commented Mar 19, 2020

Uh oh!

wking commented Mar 19, 2020

Uh oh!

openshift-ci-robot commented Mar 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wking commented Mar 14, 2020 •

edited

Loading

LalatenduMohanty commented Mar 18, 2020 •

edited

Loading

LalatenduMohanty commented Mar 18, 2020 •

edited

Loading

LalatenduMohanty commented Mar 18, 2020 •

edited

Loading