Skip to content

Conversation

@kikisdeliveryservice
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice commented Jun 11, 2020

Made the following changes as settings were not getting
picked up:

  • move failure threshold/timeout to nest under readinessProbe
  • move and correct initialDelaySeconds and periodSeconds
  • reorder container sub-elements to nest image under name

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2020
@kikisdeliveryservice kikisdeliveryservice changed the title etcd-quorum-guard: move failure threshold/timeout to under readiness … etcd-quorum-guard: move failure threshold/timeout to under readiness Jun 11, 2020
@ironcladlou
Copy link
Contributor

/retitle Bug 1829923: etcd-quorum-guard: move failure threshold/timeout to under readiness

@openshift-ci-robot openshift-ci-robot changed the title etcd-quorum-guard: move failure threshold/timeout to under readiness Bug 1829923: etcd-quorum-guard: move failure threshold/timeout to under readiness Jun 11, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 11, 2020
@openshift-ci-robot
Copy link
Contributor

@kikisdeliveryservice: This pull request references Bugzilla bug 1829923, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1829923: etcd-quorum-guard: move failure threshold/timeout to under readiness

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@openshift-ci-robot
Copy link
Contributor

@kikisdeliveryservice: This pull request references Bugzilla bug 1829923, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1829923: etcd-quorum-guard: move failure threshold/timeout to under readiness

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ironcladlou
Copy link
Contributor

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jun 11, 2020
@openshift-ci-robot
Copy link
Contributor

@ironcladlou: This pull request references Bugzilla bug 1829923, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jun 11, 2020
@kikisdeliveryservice
Copy link
Contributor Author

cc: @wking

@kikisdeliveryservice kikisdeliveryservice changed the title Bug 1829923: etcd-quorum-guard: move failure threshold/timeout to under readiness Bug 1829923: etcd-quorum-guard: move failure threshold/timeout to readinessProbe Jun 11, 2020
@openshift-ci-robot
Copy link
Contributor

@kikisdeliveryservice: This pull request references Bugzilla bug 1829923, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1829923: etcd-quorum-guard: move failure threshold/timeout to readinessProbe

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Contributor

/test e2e-aws

@kikisdeliveryservice
Copy link
Contributor Author

e2e-aws seems to have some issues..

level=error msg="When applying changes to module.vpc.aws_vpc.new_vpc[0], provider" level=error msg="\"registry.terraform.io/-/aws\" produced an unexpected new value for was" level=error msg="present, but now absent."

/test e2e-aws

@kikisdeliveryservice kikisdeliveryservice requested review from hexfusion, ironcladlou and wking and removed request for cgwalters and yuqi-zhang June 11, 2020 19:02
@hexfusion
Copy link
Contributor

LGTM in principle, let's confirm we see expected results.

@kikisdeliveryservice
Copy link
Contributor Author

LGTM in principle, let's confirm we see expected results.

cool after this is confirmed we can cherrypick to 4.5

@ironcladlou
Copy link
Contributor

@kikisdeliveryservice
Copy link
Contributor Author

i think the containers section is off maybe?

@kikisdeliveryservice
Copy link
Contributor Author

other platforms are doing it in this format:

Made the following changes as settings were not getting
picked up:
- move failure threshold/timeout to nest under readinessProbe
- move and correct initialDelaySeconds and periodSeconds
- reorder container sub-elements to nest image under name
@wking
Copy link
Member

wking commented Jun 11, 2020

/lgtm
/hold

Pull the hold when the results look good :)

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2020
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2020
@openshift-ci-robot
Copy link
Contributor

@kikisdeliveryservice: This pull request references Bugzilla bug 1829923, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1829923: etcd-quorum-guard: move failure threshold/timeout to readinessProbe

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice kikisdeliveryservice changed the title Bug 1829923: etcd-quorum-guard: move failure threshold/timeout to readinessProbe Bug 1829923: etcd-quorum-guard: update containers section in deployment.yaml Jun 11, 2020
@wking
Copy link
Member

wking commented Jun 11, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kikisdeliveryservice, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jun 11, 2020

@kikisdeliveryservice: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 9d62f94 link /test e2e-aws-scaleup-rhel7
ci/prow/e2e-metal-ipi 9d62f94 link /test e2e-metal-ipi

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kikisdeliveryservice
Copy link
Contributor Author

kikisdeliveryservice commented Jun 12, 2020

reconfirming in this e2e-aws run:
Expect deployment values from PR ftr:
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
timeoutSeconds: 3

e2e-aws deployments.json check:
$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1818/pull-ci-openshift-machine-config-operator-master-e2e-aws/1271220314893717504/artifacts/e2e-aws/gather-extra/deployments.json | zcat | jq -r '.items[] | select(.metadata.name | contains("quorum-guard")).spec.template.spec.containers[] | select(.name == "guard").readinessProbe.failureThreshold'
3
$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1818/pull-ci-openshift-machine-config-operator-master-e2e-aws/1271220314893717504/artifacts/e2e-aws/gather-extra/deployments.json | zcat | jq -r '.items[] | select(.metadata.name | contains("quorum-guard")).spec.template.spec.containers[] | select(.name == "guard").readinessProbe.initialDelaySeconds'
5
$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1818/pull-ci-openshift-machine-config-operator-master-e2e-aws/1271220314893717504/artifacts/e2e-aws/gather-extra/deployments.json | zcat | jq -r '.items[] | select(.metadata.name | contains("quorum-guard")).spec.template.spec.containers[] | select(.name == "guard").readinessProbe.periodSeconds'
5
$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1818/pull-ci-openshift-machine-config-operator-master-e2e-aws/1271220314893717504/artifacts/e2e-aws/gather-extra/deployments.json | zcat | jq -r '.items[] | select(.metadata.name | contains("quorum-guard")).spec.template.spec.containers[] | select(.name == "guard").readinessProbe.timeoutSeconds'
3

And also:
e2e-aws pods.json check:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1818/pull-ci-openshift-machine-config-operator-master-e2e-aws/1271220314893717504/artifacts/e2e-aws/gather-extra/pods.json | jq -r '.items[] | select(.metadata.name | contains("quorum-guard")).spec.containers[] | select(.name == "guard").readinessProbe | {initialDelaySeconds, periodSeconds, failureThreshold, timeoutSeconds} | tostring'
{"initialDelaySeconds":5,"periodSeconds":5,"failureThreshold":3,"timeoutSeconds":3}
{"initialDelaySeconds":5,"periodSeconds":5,"failureThreshold":3,"timeoutSeconds":3}
{"initialDelaySeconds":5,"periodSeconds":5,"failureThreshold":3,"timeoutSeconds":3}

e2e-gcp-op pods.json check:
$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1818/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1271220325798907904/artifacts/e2e-gcp-op/pods.json | jq -r '.items[] | select(.metadata.name | contains("quorum-guard")).spec.containers[] | select(.name == "guard").readinessProbe | {initialDelaySeconds, periodSeconds, failureThreshold, timeoutSeconds} | tostring'
{"initialDelaySeconds":5,"periodSeconds":5,"failureThreshold":3,"timeoutSeconds":3}
{"initialDelaySeconds":5,"periodSeconds":5,"failureThreshold":3,"timeoutSeconds":3}
{"initialDelaySeconds":5,"periodSeconds":5,"failureThreshold":3,"timeoutSeconds":3}

@kikisdeliveryservice
Copy link
Contributor Author

rhel7 and metal-ipi seem to have RBAC issues

/skip

@kikisdeliveryservice
Copy link
Contributor Author

Since I've confirmed this 2x am lifting hold to merge

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2020
@openshift-merge-robot openshift-merge-robot merged commit 22cc559 into openshift:master Jun 12, 2020
@openshift-ci-robot
Copy link
Contributor

@kikisdeliveryservice: All pull requests linked via external trackers have merged: openshift/machine-config-operator#1818, openshift/machine-config-operator#1797. Bugzilla bug 1829923 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1829923: etcd-quorum-guard: update containers section in deployment.yaml

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice
Copy link
Contributor Author

/cherry-pick release-4.5

@openshift-cherrypick-robot

@kikisdeliveryservice: #1818 failed to apply on top of branch "release-4.5":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	install/0000_80_machine-config-operator_07_etcdquorumguard_deployment.yaml
Falling back to patching base and 3-way merge...
Auto-merging install/0000_80_machine-config-operator_07_etcdquorumguard_deployment.yaml
CONFLICT (content): Merge conflict in install/0000_80_machine-config-operator_07_etcdquorumguard_deployment.yaml
Patch failed at 0001 etcd-quorum-guard: update containers section in deployment.yaml

Details

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice
Copy link
Contributor Author

Backports for 4.4 & 4.5:
#1819
#1820

Also fixed the BZs to link correctly to the open PRs 😄

@ironcladlou @hexfusion PTAL at the above backports

@ironcladlou
Copy link
Contributor

@wking
Copy link
Member

wking commented Jun 15, 2020

CVO code looks like it is missing some properties. Not sure yet if that is intentional or not, but seems like a CVO bug to me.

@ironcladlou
Copy link
Contributor

@wking

CVO code looks like it is missing some properties. Not sure yet if that is intentional or not, but seems like a CVO bug to me.

Interesting... that could explain the upgrade results. Still waiting on my Y upgrade run to confirm the issue.

@ironcladlou
Copy link
Contributor

ironcladlou commented Jun 15, 2020

Wanted to test a Y upgrade but cluster-bot didn't like that this PR was merged so I went with #1819:

test upgrade 4.4 openshift/machine-config-operator#1819

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1272513653010075648/artifacts/launch/pods.json

"failureThreshold": 3,
"periodSeconds": 10,
"successThreshold": 1,
"timeoutSeconds": 1

@ironcladlou
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants