Skip to content

Conversation

@petr-muller
Copy link
Member

  • skip alerts without required labels
  • add context on why we show the insight (started firing during update or is known to affect updates)
  • skip alerts with info level
  • explicitly mention the alert does not have a runbook

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 27, 2024
@openshift-ci-robot
Copy link

@petr-muller: This pull request references Jira Issue OCPBUGS-33896, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @evakhoni

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

  • skip alerts without required labels
  • add context on why we show the insight (started firing during update or is known to affect updates)
  • skip alerts with info level
  • explicitly mention the alert does not have a runbook

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from evakhoni May 27, 2024 18:19
@petr-muller petr-muller force-pushed the ocpbugs-33896-alerts-without-runbooks branch from 4a964fd to 937c941 Compare May 27, 2024 18:20
@openshift-ci openshift-ci bot requested review from deads2k and mfojtik May 27, 2024 18:21
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2024
@petr-muller petr-muller force-pushed the ocpbugs-33896-alerts-without-runbooks branch from 937c941 to 5e4f462 Compare May 27, 2024 18:22
@petr-muller
Copy link
Member Author

/retest

@petr-muller
Copy link
Member Author

/uncc @deads2k @mfojtik
/cc @wking @Davoska

@openshift-ci openshift-ci bot requested review from DavidHurta and wking and removed request for deads2k and mfojtik May 28, 2024 18:32
@petr-muller
Copy link
Member Author

/test e2e-aws-ovn-serial

var description string
switch {
case startedAt.After(alert.ActiveAt) && !allowedAlerts.Contains(alertName):
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: possibly shift this to an else: continue at the end, so we don't have to wonder if we've missed a case and left an empty description without a continue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this a bit, introduced named conditions which hopefully help with readability

if alert.Annotations.Description == "" {
description += " The alert has no description."
} else {
description += " The alert description is: " + alert.Annotations.Description
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the deep past (e.g. openshift/cluster-version-operator#547), alerts used message instead of the split summary/description. No worries if you don't want to address that in this pull, and checking the alert fixture data, maybe there are no relevant alerts that still do things the old way, in which case no need to handle it at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! 4.16.0-rc.2 payload shows we still have a few manifests that use message so I added handling this in the code. Some of our alerts use all fields (message, description, summary) so I try to include them all, when present. Also added fixtures to exercise this case.

$ rg 'message: ' *prom*yaml
0000_90_cluster-authentication-operator_03_prometheusrule.yaml
17:            message: >-

0000_90_ingress-operator_03_prometheusrules.yaml
25:          message: "HAProxy reloads are failing on {{ $labels.pod }}. Router is not respecting recently created or modified routes"
34:          message: "HAProxy metrics are reporting that HAProxy is down on pod {{ $labels.namespace }} / {{ $labels.pod }}"
43:          message: |
54:          message: |
78:            message: "Ingress {{ $labels.namespace }}/{{ $labels.name }} is missing the IngressClassName for 1 day."
87:            message: "Route {{ $labels.namespace }}/{{ $labels.name }} is owned by an unmanaged Ingress."

0000_50_cluster-storage-operator_12_prometheusrules.yaml
28:          message: "StorageClass count check is failing (there should not be more than one default StorageClass)"
55:            Events of the Pods should contain exact error message: "oc describe pod -n <pod namespace> <pod name>".

@petr-muller petr-muller force-pushed the ocpbugs-33896-alerts-without-runbooks branch 2 times, most recently from 7dc7a60 to ba9dd40 Compare May 31, 2024 14:45
- skip alerts without required labels
- add context on why we show the insight (started firing during update or is known to affect updates)
- skip alerts with info level
- explicitly mention the alert does not have a runbook
- fix `shortDuration` for more cases, including `now`, add tests
- handle also `message` annotation on alerts
@petr-muller petr-muller force-pushed the ocpbugs-33896-alerts-without-runbooks branch from ba9dd40 to 7515161 Compare May 31, 2024 14:59
Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 31, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 31, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 672cf68 and 2 for PR HEAD 7515161 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 31, 2024

@petr-muller: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 0bea059 into openshift:master May 31, 2024
@openshift-ci-robot
Copy link

@petr-muller: Jira Issue OCPBUGS-33896: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-33896 has been moved to the MODIFIED state.

Details

In response to this:

  • skip alerts without required labels
  • add context on why we show the insight (started firing during update or is known to affect updates)
  • skip alerts with info level
  • explicitly mention the alert does not have a runbook

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build openshift-enterprise-cli-container-v4.17.0-202406010114.p0.g0bea059.assembly.stream.el9 for distgit openshift-enterprise-cli.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants