allow longer upgrade times to run tests, but continue to fail junit at 75 minutes #25411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

deads2k merged 1 commit into openshift:master from deads2k:upgrade-longer

Aug 13, 2020

Contributor

deads2k commented Aug 12, 2020

We think we're a few minutes slow on aws. This will let us work it out, while still having a clear indication of how slow we are.


          allow longer upgrade times to run tests, but continue to fail at 75 m…

4447a19

…inutes

openshift-ci-robot requested review from bparees and smarterclayton

August 12, 2020 21:44

openshift-ci-robot added the approved label

Contributor

smarterclayton commented Aug 12, 2020

/lgtm

openshift-ci-robot assigned smarterclayton

openshift-ci-robot added the lgtm label

openshift-ci-robot commented Aug 12, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~test/OWNERS~~ [deads2k,smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Contributor

openshift-bot commented Aug 13, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments

Contributor

openshift-bot commented Aug 13, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented Aug 13, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented Aug 13, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented Aug 13, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot commented Aug 13, 2020

@deads2k: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-serial	`4447a19`	link	`/test e2e-aws-serial`
ci/prow/e2e-aws-fips	`4447a19`	link	`/test e2e-aws-fips`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Contributor Author

deads2k commented Aug 13, 2020

only touches upgrade tests, they pass. merging to improve aws signal

deads2k merged commit 74a4179 into openshift:master

wking added a commit to wking/origin that referenced this pull request


          test/e2e/upgrade: Relax 'too long' soft timeout for rollback jobs

a23e3ea

durationToSoftFailure was added in 4447a19 (allow longer upgrade
times to run tests, but continue to fail at 75 minutes, 2020-08-12, openshift#25411),
but didn't get the 2x on rollbacks we'e been adding to maximumDuration
since a53efd5 (Support --options on upgrade tests to abort in
progress, 2019-04-29, openshift#22726).  That's recently been causing the
cluster-version operator's A->B->A rollback CI jobs to time out [1].
This commit catches durationToSoftFailure up with the "2x on
rollbacks" approach, and also mentions "aborted" in messages for those
types of tests, to help remind folks what's going on.

An alternative approach would be to teach clusterUpgrade to treat
rollbacks as two separate hops (one for A->B, and another for B->A).
But that would be a more involved restructuring, and since we already
had the 2x maximumDuration precedent in place, I haven't gone in that
direction.

[1]: openshift/cluster-version-operator#514 (comment)

wking mentioned this pull request

test/e2e/upgrade: Relax 'too long' soft timeout for rollback jobs #25977

Merged

DavidHurta pushed a commit to DavidHurta/origin that referenced this pull request


          test/e2e/upgrade: Relax 'too long' soft timeout for rollback jobs

7e5a42a

durationToSoftFailure was added in 4447a19 (allow longer upgrade
times to run tests, but continue to fail at 75 minutes, 2020-08-12, openshift#25411),
but didn't get the 2x on rollbacks we'e been adding to maximumDuration
since a53efd5 (Support --options on upgrade tests to abort in
progress, 2019-04-29, openshift#22726).  That's recently been causing the
cluster-version operator's A->B->A rollback CI jobs to time out [1].
This commit catches durationToSoftFailure up with the "2x on
rollbacks" approach, and also mentions "aborted" in messages for those
types of tests, to help remind folks what's going on.

An alternative approach would be to teach clusterUpgrade to treat
rollbacks as two separate hops (one for A->B, and another for B->A).
But that would be a more involved restructuring, and since we already
had the 2x maximumDuration precedent in place, I haven't gone in that
direction.

[1]: openshift/cluster-version-operator#514 (comment)

DavidHurta pushed a commit to DavidHurta/origin that referenced this pull request


          test/e2e/upgrade: Relax 'too long' soft timeout for rollback jobs

1b35a9f

durationToSoftFailure was added in 4447a19 (allow longer upgrade
times to run tests, but continue to fail at 75 minutes, 2020-08-12, openshift#25411),
but didn't get the 2x on rollbacks we'e been adding to maximumDuration
since a53efd5 (Support --options on upgrade tests to abort in
progress, 2019-04-29, openshift#22726).  That's recently been causing the
cluster-version operator's A->B->A rollback CI jobs to time out [1].
This commit catches durationToSoftFailure up with the "2x on
rollbacks" approach, and also mentions "aborted" in messages for those
types of tests, to help remind folks what's going on.

An alternative approach would be to teach clusterUpgrade to treat
rollbacks as two separate hops (one for A->B, and another for B->A).
But that would be a more involved restructuring, and since we already
had the 2x maximumDuration precedent in place, I haven't gone in that
direction.

[1]: openshift/cluster-version-operator#514 (comment)

DavidHurta pushed a commit to DavidHurta/origin that referenced this pull request


          test/e2e/upgrade: Relax 'too long' soft timeout for rollback jobs

395a299

durationToSoftFailure was added in 4447a19 (allow longer upgrade
times to run tests, but continue to fail at 75 minutes, 2020-08-12, openshift#25411),
but didn't get the 2x on rollbacks we'e been adding to maximumDuration
since a53efd5 (Support --options on upgrade tests to abort in
progress, 2019-04-29, openshift#22726).  That's recently been causing the
cluster-version operator's A->B->A rollback CI jobs to time out [1].
This commit catches durationToSoftFailure up with the "2x on
rollbacks" approach, and also mentions "aborted" in messages for those
types of tests, to help remind folks what's going on.

An alternative approach would be to teach clusterUpgrade to treat
rollbacks as two separate hops (one for A->B, and another for B->A).
But that would be a more involved restructuring, and since we already
had the 2x maximumDuration precedent in place, I haven't gone in that
direction.

[1]: openshift/cluster-version-operator#514 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels