Skip to content

Conversation

@juzhao
Copy link
Contributor

@juzhao juzhao commented Sep 25, 2025

see from OCPBUGS-62227, case "Alerts shouldn't exceed the series limit of total series sent via telemetry from each cluster" failed on e2e-aws-ovn-techpreview job
this PR bumped the limit to 1000 to tolerate more series added to telemetry in the future

averageSeriesLimit = 850
default:
averageSeriesLimit = 780
averageSeriesLimit = 1000
Copy link
Contributor

@machine424 machine424 Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's revert 52c9e9a
and have back a unique 1000 limit for avg_over_time for all clusters


cc @stbenjam, was the change in 52c9e9a made to allow more fine-grained (potentially lower) limits for managed clusters in the future?

We’re suggesting setting the limit to 1000 across all clusters. This would act as a good safeguard in case the average bursts a lot all at once (currently +200 series).
Incrementally raising the limit isn’t sustainable for us (just did that 3 months ago #29975 (comment)). While we can point out which metrics started emitting more series, we can’t really judge whether that’s acceptable, so we always just end up raising the limits.

We can also debate the usefulness of the test itself, but that’s a separate discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted 52c9e9a

@juzhao
Copy link
Contributor Author

juzhao commented Sep 26, 2025

/retest-required

@openshift-trt
Copy link

openshift-trt bot commented Sep 26, 2025

Job Failure Risk Analysis for sha: f993b85

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial IncompleteTests
Tests for this run (22) are below the historical average (595): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: f993b85

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift High - "install should succeed: MicroShift rebase" is a new test that was not present in all runs against the current commit.

New tests seen in this PR at sha: f993b85

  • "install should succeed: MicroShift rebase" [Total: 2, Pass: 2, Fail: 0, Flake: 0]

@juzhao
Copy link
Contributor Author

juzhao commented Sep 26, 2025

e2e-aws-ovn-microshift-serial failed

: install should succeed: infrastructure expand_less 	0s
{  Failed to create MicroShift VM}

since the case is not related to telemetry series limit, I think we could skip the job

@machine424
Copy link
Contributor

thanks!
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 26, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 26, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juzhao, machine424

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 26, 2025
@juzhao
Copy link
Contributor Author

juzhao commented Sep 26, 2025

/verified by @juzhao

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Sep 26, 2025
@openshift-ci-robot
Copy link

@juzhao: This PR has been marked as verified by @juzhao.

In response to this:

/verified by @juzhao

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@juzhao
Copy link
Contributor Author

juzhao commented Sep 26, 2025

/retitle OCPBUGS-62227: bump telemetry series limit to 1000

@openshift-ci openshift-ci bot changed the title bump telemetry series limit to 1000 OCPBUGS-62227: bump telemetry series limit to 1000 Sep 26, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 26, 2025
@openshift-ci-robot
Copy link

@juzhao: This pull request references Jira Issue OCPBUGS-62227, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

see from OCPBUGS-62227, case "Alerts shouldn't exceed the series limit of total series sent via telemetry from each cluster" failed on e2e-aws-ovn-techpreview job
this PR bumped the limit to 1000 to tolerate more series added to telemetry in the future

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@juzhao
Copy link
Contributor Author

juzhao commented Sep 26, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Sep 26, 2025
@openshift-ci-robot
Copy link

@juzhao: This pull request references Jira Issue OCPBUGS-62227, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @juzhao

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Sep 26, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 26, 2025

@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: juzhao.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

@juzhao: This pull request references Jira Issue OCPBUGS-62227, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @juzhao

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@simonpasquier
Copy link
Contributor

/skip
/retest-required

@juzhao
Copy link
Contributor Author

juzhao commented Sep 26, 2025

/override ci/prow/e2e-aws-ovn-microshift-serial

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 26, 2025

@juzhao: juzhao unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers openshift-sustaining-engineers.

In response to this:

/override ci/prow/e2e-aws-ovn-microshift-serial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD da8a7f3 and 2 for PR HEAD 7c50620 in total

@simonpasquier
Copy link
Contributor

/test e2e-aws-ovn-microshift-serial

@machine424
Copy link
Contributor

/skip

@machine424
Copy link
Contributor

/retest-required

1 similar comment
@simonpasquier
Copy link
Contributor

/retest-required

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 506d6c4 and 1 for PR HEAD 7c50620 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 27, 2025

@juzhao: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node 7c50620 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-single-node-upgrade 7c50620 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-edge-zones 7c50620 link false /test e2e-aws-ovn-edge-zones
ci/prow/okd-scos-e2e-aws-ovn 7c50620 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-openstack-ovn 7c50620 link false /test e2e-openstack-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-trt
Copy link

openshift-trt bot commented Sep 27, 2025

Job Failure Risk Analysis for sha: 7c50620

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones Low
Job run should complete before timeout
This test has passed 66.67% of 6 runs on release 4.21 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:standard Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws Procedure:none SecurityMode:default Topology:ha Upgrade:none] in the last week.

@openshift-merge-bot openshift-merge-bot bot merged commit 44be851 into openshift:main Sep 27, 2025
29 of 30 checks passed
@openshift-ci-robot
Copy link

@juzhao: Jira Issue Verification Checks: Jira Issue OCPBUGS-62227
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-62227 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

In response to this:

see from OCPBUGS-62227, case "Alerts shouldn't exceed the series limit of total series sent via telemetry from each cluster" failed on e2e-aws-ovn-techpreview job
this PR bumped the limit to 1000 to tolerate more series added to telemetry in the future

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.21.0-0.nightly-2025-09-27-154726

@juzhao juzhao deleted the bump_limit branch November 11, 2025 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants