Skip to content

Conversation

@mfojtik
Copy link

@mfojtik mfojtik commented Mar 2, 2020

The ClusterOperatorDown[1] alert is described as:

Cluster Operator XYZ has not been available for 10 mins

However, the metric we fire this alert on is based on "available" AND NOT "degraded", which means we also report "ClusterOperatorDown" for operators that are degraded (but still available).

Alternative for this PR could be fixing description of this alert, but I don't see degraded operator for longer than 10 minutes as something we need to fire critical alert on?

[1] https://github.com/openshift/cluster-version-operator/blob/master/install/0000_90_cluster-version-operator_02_servicemonitor.yaml#L47

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 2, 2020
@mfojtik mfojtik force-pushed the fix-cluster-up-metric branch from b8459dd to bef55c7 Compare March 2, 2020 13:26
@sttts
Copy link
Contributor

sttts commented Mar 5, 2020

/lgtm

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mfojtik, sttts
To complete the pull request process, please assign vrutkovs
You can assign the PR to them by writing /assign @vrutkovs in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

@sttts: changing LGTM is restricted to collaborators

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mfojtik
Copy link
Author

mfojtik commented Mar 5, 2020

@eparis @smarterclayton why is LGTM restricted only to collaborators??

@smarterclayton
Copy link
Contributor

/hold

This is a fundamental definition. Needs more discussion.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2020
@smarterclayton
Copy link
Contributor

This ties into the product SLO. Operators being degraded means things are bad. Only go degraded when things are bad.

What inputs triggered this?

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 9, 2020

@mfojtik: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-upgrade bef55c7 link /test e2e-gcp-upgrade
ci/prow/e2e-upgrade bef55c7 link /test e2e-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot
Copy link
Contributor

@mfojtik: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-upgrade bef55c7 link /test e2e-agnostic-upgrade
ci/prow/e2e-agnostic bef55c7 link /test e2e-agnostic
ci/prow/e2e-agnostic-operator bef55c7 link /test e2e-agnostic-operator

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2021
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 9, 2021
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 9, 2021

@mfojtik: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 9, 2021

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this May 9, 2021
@wking
Copy link
Member

wking commented May 9, 2021

Ended up happening, more or less, via #550.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants