Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Jul 2, 2020

AWS is currently at CI-capacity limits, so move over to GCP to get more elbow room. Generated by adjusting the config/ entry by hand and then running:

$ make update

I've make the names generic e2e and e2e-upgrade, because we don't care (from the CVO side) what platform these run on. Generic names will mean smaller diffs if we pivot the workflow and cluster_profile to a different provider in the future.

I've dropped e2e-aws-upi, because the CVO shouldn't care about that level of provisioning detail. We just need "a cluster". We've had the optional UPI job since ce9361b (#7125), where it was motivated by a desire to test the cluster_installer metric. Now that that metric is well-established, we can drop the option. And we can always regrow it if we need it again later.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 2, 2020
@LalatenduMohanty
Copy link
Member

LalatenduMohanty commented Jul 2, 2020

I've dropped e2e-aws-upi, because the CVO shouldn't care about that level of provisioning detail. We just need "a cluster". We've had the optional UPI job since ce9361b (#7125), where it was motivated by a desire to test the cluster_installer metric. Now that that metric is well-established, we can drop the option. And we can always regrow it if we need it again later.

Can we continue e2e-aws-upi on AWS and move other tests to GCP ? Because I am not sure if we should take Now that that metric is well-established, we can drop the option for granted.

@wking
Copy link
Member Author

wking commented Jul 2, 2020

Because I am not sure if we should take Now that that metric is well-established, we can drop the option for granted.

How often do you use a CVO UPI presubmit to check that metric? I expect anyone concerned about it can use a promotion informer. Or launch a job with cluster-bot. But searching in the CVO repo and popping open the merge-time report on run presubmits, this job seems to have only ever been used in openshift/cluster-version-operator#319 (merged Feb. 12) and openshift/cluster-version-operator#332 (merged Feb. 28). And again, if I'm wrong, it's easy to add it back.

@sdodson
Copy link
Member

sdodson commented Jul 6, 2020

I'm in favor of switching entirely to gcp. I don't have an opinion on whether or not we should keep the optional upi test, i see no jobs in prow history today but that doesn't mean it's without value. I don't think we should change the name to drop the cloud provider, those are uniform across almost all presubmits as far as I'm aware and provides immediate context to anyone looking at job names.

@wking
Copy link
Member Author

wking commented Jul 7, 2020

I'm in favor of switching entirely to gcp.

So for all releases back through 4.1? Can you explain why you want a single platform across all releases?

I don't think we should change the name to drop the cloud provider, those are uniform across almost all presubmits as far as I'm aware and provides immediate context to anyone looking at job names.

I disagree that we need this context, or that if we need it we should get it from the job name. But I'll reroll to punt for this PR and file a follow-up with generic names where we can kick that around.

@sdodson
Copy link
Member

sdodson commented Jul 7, 2020

So for all releases back through 4.1? Can you explain why you want a single platform across all releases?

Sorry, I just meant I'm in favor of the change you're making but I'd also be in favor of switching 4.5 and 4.4 too. The number of clusters we start on AWS is really high and CVO is quite isolated from platform so it seems like we're easy to move. Even over the weekend we had 50 AWS, during the week we're maxed out 12 hours a day.

@sdodson
Copy link
Member

sdodson commented Jul 7, 2020

I disagree that we need this context, or that if we need it we should get it from the job name. But I'll reroll to punt for this PR and file a follow-up with generic names where we can kick that around.

We don't need this context, but it's the norm for all jobs that provision clusters and we should follow the norm.

…er presubmits to GCP

AWS is currently at CI-capacity limits, so move over to GCP to get
more elbow room.  Generated by adjusting the config/ entry by hand and
then running:

  $ make update

I'd initially just shifted master, because the bulk of our CI is on
master-targeted pulls.  But Scott asked for 4.4 and later [1], so
that's what I'm doing now.

I'd like to move to generic names like "e2e" and "e2e-upgrade",
because we don't care (from the CVO side) what platform these run on.
Generic names will mean smaller diffs if we pivot the workflow and
cluster_profile to a different provider in the future.  But Scott is
not (yet ;) ready to break with tradition here [2], so punting on that
for now.

I've dropped "e2e-aws-upi", because the CVO shouldn't care about that
level of provisioning detail.  We just need "a cluster".  We've had
the optional UPI job since ce9361b (cluster-version-operator: add
aws-upi optional job, 2020-02-11, openshift#7125), where it was motivated [3]
by a desire to test the cluster_installer metric [4].  Now that that
metric is well-established, we can drop the option.  And we can always
regrow it if we need it again later.

[1]: openshift#10046 (comment)
[2]: openshift#10046 (comment)
[3]: openshift#7125 (comment)
[4]: openshift/cluster-version-operator#319
@wking wking force-pushed the aws-capacity-move-to-gcp branch from 1ea50c8 to 7ee21db Compare July 7, 2020 16:15
@wking
Copy link
Member Author

wking commented Jul 7, 2020

Ok, rerolled with 1ea50c8d91 -> 7ee21db to:

  • Make the same change for 4.4+, as requested here.
  • Punt on the generic job names, as requested here and here.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 7, 2020

@wking: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/rehearse/openshift/cluster-version-operator/master/e2e 1ea50c8d9182968912c17bdd60ded78c2707df24 link /test pj-rehearse
ci/rehearse/openshift/cluster-version-operator/master/e2e-upgrade 1ea50c8d9182968912c17bdd60ded78c2707df24 link /test pj-rehearse

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@LalatenduMohanty
Copy link
Member

/retest

@LalatenduMohanty
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 8, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit ac48d7a into openshift:master Jul 8, 2020
@openshift-ci-robot
Copy link
Contributor

@wking: Updated the following 15 configmaps:

  • ci-operator-4.4-configs configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-cluster-version-operator-release-4.4.yaml using file ci-operator/config/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.4.yaml
  • job-config-4.4 configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-cluster-version-operator-release-4.4-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.4-presubmits.yaml
  • job-config-4.7 configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-cluster-version-operator-release-4.7-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.7-presubmits.yaml
  • ci-operator-4.6-configs configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-cluster-version-operator-release-4.6.yaml using file ci-operator/config/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.6.yaml
  • job-config-master configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-cluster-version-operator-master-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-master-presubmits.yaml
  • job-config-4.7 configmap in namespace ci at cluster api.ci using the following files:
    • key openshift-cluster-version-operator-release-4.7-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.7-presubmits.yaml
  • job-config-4.5 configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-cluster-version-operator-release-4.5-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.5-presubmits.yaml
  • job-config-4.6 configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-cluster-version-operator-release-4.6-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.6-presubmits.yaml
  • ci-operator-master-configs configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-cluster-version-operator-master.yaml using file ci-operator/config/openshift/cluster-version-operator/openshift-cluster-version-operator-master.yaml
  • ci-operator-4.7-configs configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-cluster-version-operator-release-4.7.yaml using file ci-operator/config/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.7.yaml
  • job-config-4.5 configmap in namespace ci at cluster api.ci using the following files:
    • key openshift-cluster-version-operator-release-4.5-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.5-presubmits.yaml
  • job-config-4.6 configmap in namespace ci at cluster api.ci using the following files:
    • key openshift-cluster-version-operator-release-4.6-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.6-presubmits.yaml
  • ci-operator-4.5-configs configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-cluster-version-operator-release-4.5.yaml using file ci-operator/config/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.5.yaml
  • job-config-master configmap in namespace ci at cluster api.ci using the following files:
    • key openshift-cluster-version-operator-master-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-master-presubmits.yaml
  • job-config-4.4 configmap in namespace ci at cluster api.ci using the following files:
    • key openshift-cluster-version-operator-release-4.4-presubmits.yaml using file ci-operator/jobs/openshift/cluster-version-operator/openshift-cluster-version-operator-release-4.4-presubmits.yaml
Details

In response to this:

AWS is currently at CI-capacity limits, so move over to GCP to get more elbow room. Generated by adjusting the config/ entry by hand and then running:

$ make update

I've make the names generic e2e and e2e-upgrade, because we don't care (from the CVO side) what platform these run on. Generic names will mean smaller diffs if we pivot the workflow and cluster_profile to a different provider in the future.

I've dropped e2e-aws-upi, because the CVO shouldn't care about that level of provisioning detail. We just need "a cluster". We've had the optional UPI job since ce9361b (#7125), where it was motivated by a desire to test the cluster_installer metric. Now that that metric is well-established, we can drop the option. And we can always regrow it if we need it again later.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the aws-capacity-move-to-gcp branch July 9, 2020 00:08
wking added a commit to wking/openshift-release that referenced this pull request Jul 9, 2020
…p -> e2e for 4.4+

In 7ee21db (ci-operator/jobs/openshift/cluster-version-operator:
Move 4.4 and later presubmits to GCP, 2020-07-02, openshift#10046), I'd
mentioned generic names as a way for component teams to say "we don't
care which platform, other folks can pick whatever they want to
balance platform volume vs. capacity".  Scott pushed back based on
lack of precedent [1,2].  And we currently do rely on platforms in
job-names for monitored success rates.  But these are presubmits, and
presubmits are noisy (e.g. if you pull-request some buggy code, your
presubmits will fail, but the delivered product is not affected by
your in-flight bugs).  So in the presubmit case we are not constrained
by job-name-based health reporting.  And we run a lot of presubmit
volume, so even if this approach only works in the presubmit case, it
gives us a lot of rebalancing flexibility.

Benefits for component teams (the CVO maintainers in the case of this
commit) include not having to care about where their jobs get
scheduled.  And with the job names not changing during rebalances,
they don't have to remember to /skip or /refresh or whatever, because
Prow would understand that it's the same effective test regardless of
the underlying platform.

Benefits to rebalancing admins is that they don't have to ask
component teams to opt-in at rebalance time.  Teams can opt in with
this pattern, and rebalancing admins can just search for e2e steps
that do not include a platform slug in their 'as' config name.

Generated by manually changing ci-operator/config/... and then running:

  $ make update

[1]: openshift#10046 (comment)
[2]: openshift#10046 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants