-
Notifications
You must be signed in to change notification settings - Fork 212
Bug 1847672: Expand supported set of probe field mutations #383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1847672: Expand supported set of probe field mutations #383
Conversation
|
@ironcladlou: This pull request references Bugzilla bug 1829923, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking discovered this while we were trying to understand why openshift/machine-config-operator#1818 was not effective during upgrades. |
40fca88 to
62f6d55
Compare
No good... openshift/machine-config-operator#1818 merged before the latest nightly and CI promotions, so maybe I'm doing something wrong |
|
Looks like I forgot the diff logic |
|
Actually, I did not forget the diff logic, so I'm still not sure what's wrong |
|
I think this PR should get a new "CVO does not manage probe timeoutSeconds, etc." bug, so we can backport the fix independently. |
| setInt32(modified, &existing.TimeoutSeconds, required.TimeoutSeconds) | ||
| setInt32(modified, &existing.PeriodSeconds, required.PeriodSeconds) | ||
| setInt32(modified, &existing.SuccessThreshold, required.SuccessThreshold) | ||
| setInt32(modified, &existing.FailureThreshold, required.FailureThreshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double-checked the Probe type, and with these four additions we are now completely covering the type. The current master logic is unchanged since probe handling landed in d9f6718 (#7), and that commit doesn't provide motivation for ignoring the properties you're adding here.
The update CI job from this PR: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/383/pull-ci-openshift-cluster-version-operator-master-e2e-aws-upgrade/1272544737227706368/artifacts/e2e-aws-upgrade/pods.json | jq -r '.items[] | select(.metadata.name | contains("quorum-guard")) | .spec.containers[] | select(.name == "guard").readinessProbe | {initialDelaySeconds, periodSeconds, failureThreshold, timeoutSeconds}'
{
"initialDelaySeconds": 5,
"periodSeconds": 5,
"failureThreshold": 3,
"timeoutSeconds": 3
}
{
"initialDelaySeconds": 5,
"periodSeconds": 5,
"failureThreshold": 3,
"timeoutSeconds": 3
}
{
"initialDelaySeconds": 5,
"periodSeconds": 5,
"failureThreshold": 3,
"timeoutSeconds": 3
}matches. But that was launched from a 4.6/master release which included the fixed manifest, so it wouldn't cover "does an update from broken manifests to fixed manifests fix the cluster?". Looking at your cluster-bot run, ClusterVersion is empty, which means something really bad happened (I'm not clear what). I've launched a replacement job, and we'll see how that goes... |
|
Replacement 4.5->PR job passed, but did not fix the probe: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1272578262370881536/artifacts/launch/pods.json | jq -r '.items[] | select(.metadata.name | contains("quorum-guard")) | .spec.containers[] | select(.name == "guard").readinessProbe | {initialDelaySeconds, periodSeconds, failureThreshold, timeoutSeconds}'{
"initialDelaySeconds": null,
"periodSeconds": 10,
"failureThreshold": 3,
"timeoutSeconds": 1
}
{
"initialDelaySeconds": null,
"periodSeconds": 10,
"failureThreshold": 3,
"timeoutSeconds": 1
}
{
"initialDelaySeconds": null,
"periodSeconds": 10,
"failureThreshold": 3,
"timeoutSeconds": 1
}Not clear on why not yet. |
|
@ironcladlou: This pull request references Bugzilla bug 1847672, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@ironcladlou: This pull request references Bugzilla bug 1847672, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Since we don't know why this doesn't fix the problem yet... /hold |
|
Created https://bugzilla.redhat.com/show_bug.cgi?id=1847672 to track this |
|
Ahh, issue is that the "latest" release didn't actually have the manifest changes yet: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1272578262370881536/artifacts/release-latest/release-payload-latest/0000_80_machine-config-operator_07_etcdquorumguard_deployment.yaml | yaml2json | jq '.spec.template.spec.containers[].readinessProbe'
{
"exec": {
"periodSecond": "5",
"command": [
"/bin/sh",
"-c",
"declare -r croot=/mnt/kube\ndeclare -r health_endpoint=\"https://127.0.0.1:2379/health\"\ndeclare -r cert=\"$(find $croot -name 'system:etcd-peer*.crt' -print -quit)\"\ndeclare -r key=\"${cert%.crt}.key\"\ndeclare -r cacert=\"$croot/ca.crt\"\nexport NSS_SDB_USE_CACHE=no\n[[ -z $cert || -z $key ]] && exit 1\ncurl --max-time 2 --silent --cert \"${cert//:/\\:}\" --key \"$key\" --cacert \"$cacert\" \"$health_endpoint\" |grep '{ *\"health\" *: *\"true\" *}'\n"
],
"initialDelaySecond": "5"
}
}
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1272578262370881536/artifacts/release-latest/release-payload-latest/image-references | jq -r '.spec.tags[] | select(.name == "machine-config-operator").annotations["io.openshift.build.commit.id"]'
908117045fe9ef32662554ed9ed557b3c1e1a965And openshift/machine-config-operator@908117045fe9ef326 is in release-4.5. So probably cluster-bot's PR-target logic is "use the PR branch for resulting images, but use the initial release for all the other images" or something. I'll test again once openshift/machine-config-operator#1819 lands. |
|
Bah, I'm just going to approve this, and we can test in cluster-bot once it's landed ;). /lgtm |
|
/hold cancel |
|
/retest |
Before this change, only the initialDelaySeconds field of probes could be updated. This patch expands the set of supported fields to include all the other int32 fields of probes so that CVO will roll out such changes.
62f6d55 to
8186626
Compare
|
@wking had to undo your tag to fix the newlines, wanna try again? Thanks for all your help |
wking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ironcladlou, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test images |
|
@ironcladlou: All pull requests linked via external trackers have merged: openshift/cluster-version-operator#383. Bugzilla bug 1847672 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherrypick release-4.5 |
|
@wking: new pull request created: #389 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
I checked 4.5.0-rc.1 -> 4.6-CI and this worked. |
|
/cherrypick release-4.4 |
|
@ironcladlou: new pull request created: #391 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Before this change, only the initialDelaySeconds field of probes could be
updated. This patch expands the set of supported fields to include all the
other int32 fields of probes so that CVO will roll out such changes.