cvo: report only unavailable operators with cluster_operator_up #334

mfojtik · 2020-03-02T12:56:51Z

The ClusterOperatorDown[1] alert is described as:

Cluster Operator XYZ has not been available for 10 mins

However, the metric we fire this alert on is based on "available" AND NOT "degraded", which means we also report "ClusterOperatorDown" for operators that are degraded (but still available).

Alternative for this PR could be fixing description of this alert, but I don't see degraded operator for longer than 10 minutes as something we need to fire critical alert on?

[1] https://github.com/openshift/cluster-version-operator/blob/master/install/0000_90_cluster-version-operator_02_servicemonitor.yaml#L47

sttts · 2020-03-05T09:23:43Z

/lgtm

openshift-ci-robot · 2020-03-05T09:23:49Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mfojtik, sttts
To complete the pull request process, please assign vrutkovs
You can assign the PR to them by writing /assign @vrutkovs in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2020-03-05T09:23:50Z

@sttts: changing LGTM is restricted to collaborators

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mfojtik · 2020-03-05T09:59:18Z

@eparis @smarterclayton why is LGTM restricted only to collaborators??

smarterclayton · 2020-03-18T13:49:21Z

/hold

This is a fundamental definition. Needs more discussion.

smarterclayton · 2020-03-18T13:51:05Z

This ties into the product SLO. Operators being degraded means things are bad. Only go degraded when things are bad.

What inputs triggered this?

openshift-ci-robot · 2020-07-09T19:04:32Z

@mfojtik: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-gcp-upgrade	`bef55c7`	link	`/test e2e-gcp-upgrade`
ci/prow/e2e-upgrade	`bef55c7`	link	`/test e2e-upgrade`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-merge-robot · 2020-12-09T19:36:10Z

@mfojtik: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-agnostic-upgrade	`bef55c7`	link	`/test e2e-agnostic-upgrade`
ci/prow/e2e-agnostic	`bef55c7`	link	`/test e2e-agnostic`
ci/prow/e2e-agnostic-operator	`bef55c7`	link	`/test e2e-agnostic-operator`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2021-03-10T08:06:44Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-04-09T10:01:01Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-05-09T13:02:00Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2021-05-09T13:02:06Z

@mfojtik: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2021-05-09T13:02:14Z

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking · 2021-05-09T20:56:19Z

Ended up happening, more or less, via #550.

openshift-ci-robot requested review from abhinavdahiya and wking March 2, 2020 12:56

openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 2, 2020

cvo: report only unavailable operators with cluster_operator_up

bef55c7

mfojtik force-pushed the fix-cluster-up-metric branch from b8459dd to bef55c7 Compare March 2, 2020 13:26

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2020

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2021

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 9, 2021

openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2021

openshift-ci bot closed this May 9, 2021

cvo: report only unavailable operators with cluster_operator_up #334

cvo: report only unavailable operators with cluster_operator_up #334

Uh oh!

Conversation

mfojtik commented Mar 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sttts commented Mar 5, 2020

Uh oh!

openshift-ci-robot commented Mar 5, 2020

Uh oh!

openshift-ci-robot commented Mar 5, 2020

Uh oh!

mfojtik commented Mar 5, 2020

Uh oh!

smarterclayton commented Mar 18, 2020

Uh oh!

smarterclayton commented Mar 18, 2020

Uh oh!

openshift-ci-robot commented Jul 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-merge-robot commented Dec 9, 2020

Uh oh!

openshift-bot commented Mar 10, 2021

Uh oh!

openshift-bot commented Apr 9, 2021

Uh oh!

openshift-bot commented May 9, 2021

Uh oh!

openshift-ci bot commented May 9, 2021

Uh oh!

openshift-ci bot commented May 9, 2021

Uh oh!

wking commented May 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mfojtik commented Mar 2, 2020 •

edited

Loading

openshift-ci-robot commented Jul 9, 2020 •

edited

Loading