Bug 1808568: must-gather: when openshift-apiserver restarts, ignore wrong prefix #24641

sallyom · 2020-03-05T14:48:44Z

example failure logs where this flake occurs:

572 ift:discovery\" to Group \"system:authenticated\""}}
573 {"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"2d7c58d5-f230-4097-bd5e-620e9ca3ec24","stage":"ResponseComplete","requestUR      I":"/openapi/v2","verb":"get","user":{"username":"system:aggregator","groups":["system:authenticated"]},"sourceIPs":["10.128.0.1"],"responseStatus":{"me      tadata":{},"code":304},"requestReceivedTimestamp":"2020-03-04T19:04:57.786208Z","stageTimestamp":"2020-03-04T19:04:57.786270Z","annotations":{"authoriza      tion.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"system:openshift:discovery\" of ClusterRole \"system:      openshift:discovery\" to Group \"system:authenticated\""}}

…rong prefix

openshift-ci-robot · 2020-03-05T14:48:46Z

@sallyom: This pull request references Bugzilla bug 1808568, which is invalid:

expected the bug to be open, but it isn't
expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is CLOSED (NOTABUG) instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1808568: must-gather: when openshift-apiserver restarts, ignore wrong prefix

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2020-03-05T14:55:13Z

@sallyom: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

must-gather: when openshift-apiserver restarts, ignore wrong prefix

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2020-03-05T16:10:14Z

@sallyom: This pull request references Bugzilla bug 1808568, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.5.0) matches configured target release for branch (4.5.0)
bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Details

In response to this:

Bug 1808568: must-gather: when openshift-apiserver restarts, ignore wrong prefix

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sallyom · 2020-03-05T16:57:39Z

/retest

sallyom · 2020-03-05T18:53:14Z

/test e2e-aws-serial

deads2k · 2020-03-05T20:54:15Z

/hold

This is a symptom of the openshift-apiserver not correctly flushing its audit log to disk or must-gather not correctly retrieving content. Either way that this happened, do we know why it suddenly became a problem?

soltysh · 2020-03-09T11:43:15Z

This is a symptom of the openshift-apiserver not correctly flushing its audit log to disk or must-gather not correctly retrieving content. Either way that this happened, do we know why it suddenly became a problem?

From my investigation this was happening at the time when openshift-apiserver was restarting, but back then it wasn't such a big issue. Not sure why this e2e flake bubbled up recently this high. I agree that we should investigate the root cause, but at the same time, I'd prefer make this test less strict. We can make a separate BZ about openshift-apiserver that will need further investigation and make the test more resilient in parallel. @deads2k objections?
@sallyom can you make a separate BZ for that audit issue?

soltysh

/lgtm
/approve
I'll let @deads2k remove the hold if he agrees with my previous statement

openshift-ci-robot · 2020-03-09T11:44:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sallyom, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~test/extended/OWNERS~~ [soltysh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sallyom · 2020-03-09T16:32:35Z

I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1811737 to track root cause

deads2k · 2020-03-10T12:11:08Z

From my investigation this was happening at the time when openshift-apiserver was restarting, but back then it wasn't such a big issue. Not sure why this e2e flake bubbled up recently this high. I agree that we should investigate the root cause, but at the same time, I'd prefer make this test less strict. We can make a separate BZ about openshift-apiserver that will need further investigation and make the test more resilient in parallel. @deads2k objections?

Yeah, this exposed a critical data corruption problem with the audit log that we fixed with openshift/cluster-openshift-apiserver-operator#331 . Have you see it since then?

deads2k · 2020-03-10T12:11:44Z

/lgtm cancel

sallyom · 2020-03-10T15:02:34Z

this is no longer required, AFAICT, since openshift/cluster-openshift-apiserver-operator#331 resolved root cause

Bug 1808568: must-gather: when openshift-apiserver restarts, ignore w…

73bc200

…rong prefix

openshift-ci-robot assigned soltysh Mar 5, 2020

openshift-ci-robot requested a review from smarterclayton March 5, 2020 14:48

openshift-ci-robot added bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 5, 2020

sallyom changed the title ~~Bug 1808568: must-gather: when openshift-apiserver restarts, ignore wrong prefix~~ must-gather: when openshift-apiserver restarts, ignore wrong prefix Mar 5, 2020

openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Mar 5, 2020

sallyom changed the title ~~must-gather: when openshift-apiserver restarts, ignore wrong prefix~~ Bug 1808568: must-gather: when openshift-apiserver restarts, ignore wrong prefix Mar 5, 2020

openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Mar 5, 2020

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 5, 2020

soltysh approved these changes Mar 9, 2020

View reviewed changes

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 9, 2020

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2020

openshift-ci-robot assigned deads2k Mar 10, 2020

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 10, 2020

sallyom closed this Mar 10, 2020

Bug 1808568: must-gather: when openshift-apiserver restarts, ignore wrong prefix #24641

Bug 1808568: must-gather: when openshift-apiserver restarts, ignore wrong prefix #24641

Conversation

sallyom commented Mar 5, 2020

Uh oh!

openshift-ci-robot commented Mar 5, 2020

Uh oh!

openshift-ci-robot commented Mar 5, 2020

Uh oh!

openshift-ci-robot commented Mar 5, 2020

Uh oh!

sallyom commented Mar 5, 2020

Uh oh!

sallyom commented Mar 5, 2020

Uh oh!

deads2k commented Mar 5, 2020

Uh oh!

soltysh commented Mar 9, 2020

Uh oh!

soltysh left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Mar 9, 2020

Uh oh!

sallyom commented Mar 9, 2020

Uh oh!

deads2k commented Mar 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deads2k commented Mar 10, 2020

Uh oh!

sallyom commented Mar 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

deads2k commented Mar 10, 2020 •

edited

Loading