Skip to content

Conversation

@bparees
Copy link
Contributor

@bparees bparees commented May 24, 2019

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2019
@bparees bparees changed the title add long lived cluster management job [WIP] add long lived cluster management job May 24, 2019
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 24, 2019
@openshift-ci-robot openshift-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 24, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2019
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 10, 2019
@bparees bparees force-pushed the longlived branch 2 times, most recently from b4f6455 to 1fa1e42 Compare June 12, 2019 21:14
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 12, 2019
@bparees bparees force-pushed the longlived branch 2 times, most recently from 5f42d39 to 279279e Compare June 12, 2019 21:24
@bparees
Copy link
Contributor Author

bparees commented Jun 13, 2019

status update: i think the only thing left to be done on this is to switch to using llc aws creds.

once that's done this PR will introduce a job that will on a daily basis:

  1. check if the existing long lived cluster is healthy
  2. if not, perform must-gather and attempt to tear it down
  3. if the cluster wasn't healthy, install a new cluster

the job will report failed if the cluster was not healthy.

next immediate steps (additional PRs) will be to run e2e against the cluster periodically, and introducing a long-lived workload with healthchecking.

@bparees bparees force-pushed the longlived branch 2 times, most recently from 29d3691 to 3588f46 Compare June 14, 2019 20:40
@bparees bparees changed the title [WIP] add long lived cluster management job [WIP] add long lived cluster management job and e2e test Jun 14, 2019
@openshift-merge-robot
Copy link
Contributor

/test core-valid
/test core-dry

@bparees bparees force-pushed the longlived branch 2 times, most recently from 62891d4 to 60e3337 Compare November 12, 2019 17:59
@bparees bparees changed the title [WIP] add long lived cluster management job and e2e test add long lived cluster management job and e2e test Nov 12, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 12, 2019
@bparees bparees force-pushed the longlived branch 2 times, most recently from 7b0fba2 to 707b92a Compare November 12, 2019 18:11
@openshift-ci-robot openshift-ci-robot added the sig/azure Categorizes item related to Azure jobs label Nov 12, 2019
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 12, 2019
@bparees
Copy link
Contributor Author

bparees commented Nov 12, 2019

@smarterclayton @stevekuznetsov i think this is ready to go, for 4.1 at least.. once i see it working for 4.1 i'll introduce 4.2+4.3 versions. Can you review?

@bparees bparees force-pushed the longlived branch 5 times, most recently from d7982ad to ea0d505 Compare November 13, 2019 20:39
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this formatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't. Make jobs does this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context: ef3cd1d#r35370176

(you've argued with @stevekuznetsov about this before, it appears to have gone unresolved)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make jobs is a turd if it breaks human entered text. Can you open a card for DPTP? Needs to get fixed or disabled for these jobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submitted a ticket to openshift-ci-requsts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees
Copy link
Contributor Author

bparees commented Nov 14, 2019

@smarterclayton comments addressed except for the formatting thing that i don't think i can fix.

@smarterclayton
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 14, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

@bparees: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/pj-rehearse 39e69e2 link /test pj-rehearse

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 91e63bd into openshift:master Nov 14, 2019
@openshift-ci-robot
Copy link
Contributor

@bparees: Updated the job-config-4.1 configmap in namespace ci at cluster default using the following files:

  • key openshift-release-release-4.1-periodics.yaml using file ci-operator/jobs/openshift/release/openshift-release-release-4.1-periodics.yaml
Details

In response to this:

related to: https://jira.coreos.com/browse/DPP-1338
blocked by: https://jira.coreos.com/browse/DPP-2164

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bparees bparees deleted the longlived branch November 18, 2019 16:07
wking added a commit to wking/openshift-release that referenced this pull request Jan 2, 2020
For 4.2->4.3 and 4.3->4.4.  I've left off 4.1->4.2, since 4.1 is
pretty old and stable.  I've left off 4.4->4.5, because we haven't
built a 4.5 nightly yet [1].  This should help catch breakage like the
ephemeral-storage request that broke 4.2 -> * updates [2], but didn't
turn up in CI because we don't have any jobs testing nightly ->
updates.  After this commit we'll have:

* endurance-upgrade-aws-4.3
  I'm not really clear on what this does.  Seems to use the template
  from 39e69e2 (add long lived cluster management job and e2e test,
  2019-06-12, openshift#3887).  Seems to use 4.3-ci -> self updates?  I dunno.

* release-openshift-origin-installer-e2e-aws-upgrade-4.3
  Lets the release controller or ci-operator or some such choose the
  source and target version.

* release-openshift-origin-installer-e2e-aws-upgrade-fips-4.3
  4.3-ci penultimate -> 4.3-ci latest on AWS with FIPS enabled.

* release-openshift-origin-installer-e2e-azure-upgrade-4.3
  4.3-ci penultimate -> 4.3-ci latest on Azure

* release-openshift-origin-installer-e2e-gcp-upgrade-4.3
  4.3-ci penultimate -> 4.3-ci latest on GCP

* release-openshift-origin-installer-e2e-aws-upgrade-4.2-to-4.3
  4.2-stable -> 4.3-ci on AWS.

* release-openshift-origin-installer-e2e-aws-upgrade-4.2-nightly-to-4.3
  4.2-nightly -> 4.3-nightly on AWS.  New in this commit.

* release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.2-to-4.3
  4.2-stable -> 4.3-ci on AWS with TEST_OPTIONS=abort-at=99.  For more
  on abort-at, see openshift/origin@a53efd5e27 (Support --options on
  upgrade tests to abort in progress, 2019-04-29,
  openshift/origin#22726).

* release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.3
  4.3-ci penultimate -> 4.3-ci latest on AWS with TEST_OPTIONS=abort-at=random.

and similarly for 4.4.

I'm not entirely clear on how the release informer jobs ingest the
version being considered for promotion, maybe these new jobs will end
up just being vanilla periodics.  But that's probably fine, because
all we need is some sort of signal in CI to show that 4.2-nightly ->
4.3 (or whatever) is broken before we give that 4.2 nightly a stable
name like 4.2.13 (or whatever).  Even if these do run as 4.3 promotion
informers, breakage like [2] happened in the 4.2 nightly.  So you
could still have:

1. 4.2 PR lands and breaks 4.2 -> 4.3.
2. Associated 4.2 nightly promotion goes through all green.
3. Some subsequent 4.3 change lands, and the informing job fails
   because of the 4.2 change from step 1.

But again, as long as we have some kind of signal (like the one added
by this commit), the release admins should hear about it and know that
they need the breakage triaged before they give a nightly a stable
name and sign the release.

[1]: https://openshift-release.svc.ci.openshift.org/#4.5.0-0.nightly
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1786315#c2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. sig/azure Categorizes item related to Azure jobs size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants