Skip to content

Conversation

@ironcladlou
Copy link
Contributor

Before this change, only the initialDelaySeconds field of probes could be
updated. This patch expands the set of supported fields to include all the
other int32 fields of probes so that CVO will roll out such changes.

@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Jun 15, 2020
@openshift-ci-robot
Copy link
Contributor

@ironcladlou: This pull request references Bugzilla bug 1829923, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1829923: Expand supported set of probe field mutations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jun 15, 2020
@ironcladlou
Copy link
Contributor Author

@wking discovered this while we were trying to understand why openshift/machine-config-operator#1818 was not effective during upgrades.

@ironcladlou
Copy link
Contributor Author

test upgrade 4.5 openshift/cluster-version-operator#383

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1272545200471805952/artifacts/launch/pods.json

No good... openshift/machine-config-operator#1818 merged before the latest nightly and CI promotions, so maybe I'm doing something wrong

@ironcladlou
Copy link
Contributor Author

Looks like I forgot the diff logic

@ironcladlou
Copy link
Contributor Author

Actually, I did not forget the diff logic, so I'm still not sure what's wrong

@wking
Copy link
Member

wking commented Jun 15, 2020

I think this PR should get a new "CVO does not manage probe timeoutSeconds, etc." bug, so we can backport the fix independently.

setInt32(modified, &existing.TimeoutSeconds, required.TimeoutSeconds)
setInt32(modified, &existing.PeriodSeconds, required.PeriodSeconds)
setInt32(modified, &existing.SuccessThreshold, required.SuccessThreshold)
setInt32(modified, &existing.FailureThreshold, required.FailureThreshold)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double-checked the Probe type, and with these four additions we are now completely covering the type. The current master logic is unchanged since probe handling landed in d9f6718 (#7), and that commit doesn't provide motivation for ignoring the properties you're adding here.

@wking
Copy link
Member

wking commented Jun 15, 2020

No good... openshift/machine-config-operator#1818 merged before the latest nightly and CI promotions, so maybe I'm doing something wrong

The update CI job from this PR:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/383/pull-ci-openshift-cluster-version-operator-master-e2e-aws-upgrade/1272544737227706368/artifacts/e2e-aws-upgrade/pods.json | jq -r '.items[] | select(.metadata.name | contains("quorum-guard")) | .spec.containers[] | select(.name ==  "guard").readinessProbe | {initialDelaySeconds, periodSeconds, failureThreshold, timeoutSeconds}'
{
  "initialDelaySeconds": 5,
  "periodSeconds": 5,
  "failureThreshold": 3,
  "timeoutSeconds": 3
}
{
  "initialDelaySeconds": 5,
  "periodSeconds": 5,
  "failureThreshold": 3,
  "timeoutSeconds": 3
}
{
  "initialDelaySeconds": 5,
  "periodSeconds": 5,
  "failureThreshold": 3,
  "timeoutSeconds": 3
}

matches. But that was launched from a 4.6/master release which included the fixed manifest, so it wouldn't cover "does an update from broken manifests to fixed manifests fix the cluster?".

Looking at your cluster-bot run, ClusterVersion is empty, which means something really bad happened (I'm not clear what). I've launched a replacement job, and we'll see how that goes...

@wking
Copy link
Member

wking commented Jun 15, 2020

Replacement 4.5->PR job passed, but did not fix the probe:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1272578262370881536/artifacts/launch/pods.json | jq -r '.items[] | select(.metadata.name | contains("quorum-guard")) | .spec.containers[] | select(.name ==  "guard").readinessProbe | {initialDelaySeconds, periodSeconds, failureThreshold, timeoutSeconds}'{
  "initialDelaySeconds": null,
  "periodSeconds": 10,
  "failureThreshold": 3,
  "timeoutSeconds": 1
}
{
  "initialDelaySeconds": null,
  "periodSeconds": 10,
  "failureThreshold": 3,
  "timeoutSeconds": 1
}
{
  "initialDelaySeconds": null,
  "periodSeconds": 10,
  "failureThreshold": 3,
  "timeoutSeconds": 1
}

Not clear on why not yet.

@wking
Copy link
Member

wking commented Jun 16, 2020

/hold

So nobody swoops in and merges until we sort out why this doesn't fix 4.5 -> PR CI.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 16, 2020
@ironcladlou ironcladlou changed the title Bug 1829923: Expand supported set of probe field mutations Bug 1847672: Expand supported set of probe field mutations Jun 16, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. and removed bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jun 16, 2020
@openshift-ci-robot
Copy link
Contributor

@ironcladlou: This pull request references Bugzilla bug 1847672, which is invalid:

  • expected the bug to target the "4.6.0" release, but it targets "4.5.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1847672: Expand supported set of probe field mutations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jun 16, 2020
@ironcladlou
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jun 16, 2020
@openshift-ci-robot
Copy link
Contributor

@ironcladlou: This pull request references Bugzilla bug 1847672, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jun 16, 2020
@ironcladlou
Copy link
Contributor Author

Since we don't know why this doesn't fix the problem yet...

/hold

@ironcladlou
Copy link
Contributor Author

Created https://bugzilla.redhat.com/show_bug.cgi?id=1847672 to track this

@wking
Copy link
Member

wking commented Jun 17, 2020

Ahh, issue is that the "latest" release didn't actually have the manifest changes yet:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1272578262370881536/artifacts/release-latest/release-payload-latest/0000_80_machine-config-operator_07_etcdquorumguard_deployment.yaml | yaml2json | jq '.spec.template.spec.containers[].readinessProbe'
{
  "exec": {
    "periodSecond": "5",
    "command": [
      "/bin/sh",
      "-c",
      "declare -r croot=/mnt/kube\ndeclare -r health_endpoint=\"https://127.0.0.1:2379/health\"\ndeclare -r cert=\"$(find $croot -name 'system:etcd-peer*.crt' -print -quit)\"\ndeclare -r key=\"${cert%.crt}.key\"\ndeclare -r cacert=\"$croot/ca.crt\"\nexport NSS_SDB_USE_CACHE=no\n[[ -z $cert || -z $key ]] && exit 1\ncurl --max-time 2 --silent --cert \"${cert//:/\\:}\" --key \"$key\" --cacert \"$cacert\" \"$health_endpoint\" |grep '{ *\"health\" *: *\"true\" *}'\n"
    ],
    "initialDelaySecond": "5"
  }
}
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1272578262370881536/artifacts/release-latest/release-payload-latest/image-references | jq -r '.spec.tags[] | select(.name == "machine-config-operator").annotations["io.openshift.build.commit.id"]'
908117045fe9ef32662554ed9ed557b3c1e1a965

And openshift/machine-config-operator@908117045fe9ef326 is in release-4.5. So probably cluster-bot's PR-target logic is "use the PR branch for resulting images, but use the initial release for all the other images" or something. I'll test again once openshift/machine-config-operator#1819 lands.

@wking
Copy link
Member

wking commented Jun 18, 2020

Bah, I'm just going to approve this, and we can test in cluster-bot once it's landed ;).

/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 18, 2020
@wking
Copy link
Member

wking commented Jun 18, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2020
@ironcladlou
Copy link
Contributor Author

/retest

Before this change, only the initialDelaySeconds field of probes could be
updated. This patch expands the set of supported fields to include all the
other int32 fields of probes so that CVO will roll out such changes.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2020
@ironcladlou
Copy link
Contributor Author

@wking had to undo your tag to fix the newlines, wanna try again? Thanks for all your help

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ironcladlou, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking
Copy link
Member

wking commented Jun 18, 2020

/test images

@openshift-merge-robot openshift-merge-robot merged commit 10f768c into openshift:master Jun 18, 2020
@openshift-ci-robot
Copy link
Contributor

@ironcladlou: All pull requests linked via external trackers have merged: openshift/cluster-version-operator#383. Bugzilla bug 1847672 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1847672: Expand supported set of probe field mutations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member

wking commented Jun 18, 2020

/cherrypick release-4.5

@openshift-cherrypick-robot

@wking: new pull request created: #389

Details

In response to this:

/cherrypick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member

wking commented Jun 18, 2020

I checked 4.5.0-rc.1 -> 4.6-CI and this worked.

@ironcladlou
Copy link
Contributor Author

/cherrypick release-4.4

@openshift-cherrypick-robot

@ironcladlou: new pull request created: #391

Details

In response to this:

/cherrypick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants