Skip to content

Conversation

@sallyom
Copy link
Contributor

@sallyom sallyom commented Mar 5, 2020

/assign @soltysh
/cc @smarterclayton

example failure logs where this flake occurs:

572 ift:discovery\" to Group \"system:authenticated\""}}
573 {"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"2d7c58d5-f230-4097-bd5e-620e9ca3ec24","stage":"ResponseComplete","requestUR      I":"/openapi/v2","verb":"get","user":{"username":"system:aggregator","groups":["system:authenticated"]},"sourceIPs":["10.128.0.1"],"responseStatus":{"me      tadata":{},"code":304},"requestReceivedTimestamp":"2020-03-04T19:04:57.786208Z","stageTimestamp":"2020-03-04T19:04:57.786270Z","annotations":{"authoriza      tion.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"system:openshift:discovery\" of ClusterRole \"system:      openshift:discovery\" to Group \"system:authenticated\""}}

@openshift-ci-robot
Copy link

@sallyom: This pull request references Bugzilla bug 1808568, which is invalid:

  • expected the bug to be open, but it isn't
  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is CLOSED (NOTABUG) instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1808568: must-gather: when openshift-apiserver restarts, ignore wrong prefix

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 5, 2020
@sallyom sallyom changed the title Bug 1808568: must-gather: when openshift-apiserver restarts, ignore wrong prefix must-gather: when openshift-apiserver restarts, ignore wrong prefix Mar 5, 2020
@openshift-ci-robot
Copy link

@sallyom: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

must-gather: when openshift-apiserver restarts, ignore wrong prefix

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Mar 5, 2020
@sallyom sallyom changed the title must-gather: when openshift-apiserver restarts, ignore wrong prefix Bug 1808568: must-gather: when openshift-apiserver restarts, ignore wrong prefix Mar 5, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Mar 5, 2020
@openshift-ci-robot
Copy link

@sallyom: This pull request references Bugzilla bug 1808568, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1808568: must-gather: when openshift-apiserver restarts, ignore wrong prefix

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sallyom
Copy link
Contributor Author

sallyom commented Mar 5, 2020

/retest

@sallyom
Copy link
Contributor Author

sallyom commented Mar 5, 2020

/test e2e-aws-serial

@deads2k
Copy link
Contributor

deads2k commented Mar 5, 2020

/hold

This is a symptom of the openshift-apiserver not correctly flushing its audit log to disk or must-gather not correctly retrieving content. Either way that this happened, do we know why it suddenly became a problem?

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 5, 2020
@soltysh
Copy link
Contributor

soltysh commented Mar 9, 2020

This is a symptom of the openshift-apiserver not correctly flushing its audit log to disk or must-gather not correctly retrieving content. Either way that this happened, do we know why it suddenly became a problem?

From my investigation this was happening at the time when openshift-apiserver was restarting, but back then it wasn't such a big issue. Not sure why this e2e flake bubbled up recently this high. I agree that we should investigate the root cause, but at the same time, I'd prefer make this test less strict. We can make a separate BZ about openshift-apiserver that will need further investigation and make the test more resilient in parallel. @deads2k objections?
@sallyom can you make a separate BZ for that audit issue?

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
I'll let @deads2k remove the hold if he agrees with my previous statement

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 9, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sallyom, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2020
@sallyom
Copy link
Contributor Author

sallyom commented Mar 9, 2020

I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1811737 to track root cause

@deads2k
Copy link
Contributor

deads2k commented Mar 10, 2020

From my investigation this was happening at the time when openshift-apiserver was restarting, but back then it wasn't such a big issue. Not sure why this e2e flake bubbled up recently this high. I agree that we should investigate the root cause, but at the same time, I'd prefer make this test less strict. We can make a separate BZ about openshift-apiserver that will need further investigation and make the test more resilient in parallel. @deads2k objections?

Yeah, this exposed a critical data corruption problem with the audit log that we fixed with openshift/cluster-openshift-apiserver-operator#331 . Have you see it since then?

@deads2k
Copy link
Contributor

deads2k commented Mar 10, 2020

/lgtm cancel

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 10, 2020
@sallyom
Copy link
Contributor Author

sallyom commented Mar 10, 2020

this is no longer required, AFAICT, since openshift/cluster-openshift-apiserver-operator#331 resolved root cause

@sallyom sallyom closed this Mar 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants