Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Feb 25, 2020

So we can automatically propose stabilizing channel promotions.

The implementation uses git blame --first-parent ... to determine when releases landed in the feeder channel. Picking the versions out of the blame data by using a '- ' prefix is a bit of a hack, but it gets the job done for now.

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 25, 2020
@wking wking force-pushed the auto-stabilize branch 2 times, most recently from 8c32599 to 0c41d57 Compare February 25, 2020 00:30
@wking
Copy link
Member Author

wking commented Feb 25, 2020

Example output (as of 0c41d57):

$ hack/graph-util.py stabilization-changes
INFO: considering promotions from candidate-4.2 to fast-4.2 after PT24H
INFO:   recommended: 4.1.18 (81 days, 1:33:57.131135)
DEBUG:     channels/fast-4: Promote 4.1.18 to fast-4.2
DEBUG:     It was promoted the feeder candidate-4.2 by 2c088bc147 (channels: Convert to Eric's proposed channel format, 2019-12-05).
INFO:   recommended: 4.1.20 (81 days, 0:53:55.131135)
DEBUG:     channels/fast-4: Promote 4.1.20 to fast-4.2
DEBUG:     It was promoted the feeder candidate-4.2 by b4db5a619e (channels: Catch up with production, 2019-12-05).
...

@wking wking changed the title channels: Add a 'feeder' property channels: Add 'feeder' and 'tombstones' properties Feb 25, 2020
@wking
Copy link
Member Author

wking commented Feb 25, 2020

Example output (as of dbbbbd7a8fb):

$ hack/graph-util.py stabilization-changes
INFO: considering promotions from candidate-4.2 to fast-4.2 after PT24H
INFO:   recommended: 4.1.34 (12 days, 13:34:53.517678)
DEBUG:     channels/fast-4.2: Promote 4.1.34 to fast-4.2
DEBUG:     It was promoted the feeder candidate-4.2 by 8e0a12f267 (candidate-4.1: add 4.1.34, 2020-02-12).
INFO:   recommended: 4.2.20 (5 days, 13:04:12.517678)
DEBUG:     channels/fast-4.2: Promote 4.2.20 to fast-4.2
DEBUG:     It was promoted the feeder candidate-4.2 by 3eb8b2e776 (candidate-4.2: add 4.2.20, 2020-02-19).
...

Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spelling of licensing needs to be fixed in The [contributing documentation](CONTRIBUTING.md) covers licencing and the usual Git flow

Edit the appropriate file in `channels/`.
For example, to add a release to stable-4.2 you would edit `channels/stable-4.2.yaml`.
Channel semantics are documented [here][channel-semantics].
For example, to add a release to candidate-4.2 you would edit [`channels/candidate-4.2.yaml`](channels/candidate-4.2.yaml).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/would/need to/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preexisting language; file a separate PR?

which declares the intention that nodes and edges will be considered for promotion into `fast-4.2` after cooking for 24 hours in `candidate-4.2`.
The `delay` value is an [ISO 8601][rfc-3339-p13] [duration][iso-8601-durations].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

which declares the intention that nodes and edges will be considered for promotion into `fast-4.2` after cooking for 24 hours in `candidate-4.2`.
The `delay` value is an [ISO 8601][rfc-3339-p13] [duration][iso-8601-durations].

This is the expected delay, but it does not mean that promotion will happen at that moment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By which automation is this delay used and how would one override it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By which automation is this delay used...

By a periodic that runs stabilization-changes and creates PRs for any recommended promotions. Then eventually, when we trust our automatic edge-pulling, we give the robot creds to approve their own PRs so they will auto-merge.

... and how would one override it?

Manually create pull requests if you don't want to wait the configured delay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If / when robots were auto-merging these PRs, how would release admins override the bot from promoting something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important action is ART publishing the errata, which is when a release is declared supported, regardless of what we do in Cincinnati. So you could tombstone a release if you didn't want it auto-promoted. But if the release is only in candidate, you had better also tell ART to not ship the errata. And if the errata has shipped, tombstoning fast or stable promotions won't drop any supported-ness.

_Note:_ Once we have phased release rollouts, we will drop the fast/stable distinction from this repository and promote to a unified fast/stable channel with a start time and rollout duration.
Until then, we are using fast channels to feed stable channels with a delay, just like candidate channels feed fast channels.

In this repository, the intended promotion flow is reflected by a `feeder` property in the channel declaration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the feeder property if we reduce the chain to being candidate --> stable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its still useful, just in case the names would be less clear

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the feeder property if we reduce the chain to being candidate --> stable?

We also have prerelease -> stable for 4.1. And we could theoretically also make a plural feeders and have multiple candidate-4.* feeding into candidate (#112). I like this being explicit, but don't really care if folks would rather bake the feeder directions into the recommending script (currently the stabilization-changes subcommand) instead of tracking it in the channel YAML.

For example, [`channels/candidate-4.2.yaml`](channels/candidate-4.2.yaml) has:

```yaml
tombstones:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose naming it frozen instead of tombstones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveej frozen does not communicate its intent. These are the nodes which can not have any future edge. So tombstones makes more sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that tombstones doesn't communicate it well either, since these versions are still in use somewhere, which is the reason why we need to give them an additional property. Also, It would be nice to stick to a more technical or neutral term here.

How about immobilized?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like tombstones more. These builds are similar to ghosts we don't speak of since they are dead to us and we don't want them coming back :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tombstone has an existing technical usage, which is similar to the usage here.

@LalatenduMohanty
Copy link
Member

@eparis do we have comments on this PR. I am planning to merge this soon.

@LalatenduMohanty
Copy link
Member

LalatenduMohanty commented Feb 26, 2020

@eparis do we have comments on this PR. I am planning to merge this soon.

There is a change of plan as discussed with OTA team. We are planning to deprecate the python script so not planning to add new functionality to the script.
/hold

@steveej
Copy link
Contributor

steveej commented Feb 27, 2020

We are planning to deprecate the python script so not planning to add new functionality to the script.

I second this. I see the scope of this PR to discuss the data model changes and their intended usage, therefore please see my review comments.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 29, 2020
@LalatenduMohanty
Copy link
Member

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2020
@wking
Copy link
Member Author

wking commented Mar 11, 2020

Rebased onto master with 9b9497b -> b3b6d48.

@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 11, 2020
@wking
Copy link
Member Author

wking commented Mar 12, 2020

We are planning to deprecate the python script so not planning to add new functionality to the script.

We are planning to deprecate the push-to-quay logic, which currently lives in the Python script. This promotion-recommendation logic is distinct; it's about when to create pull requests for this repository. I think we want it, and I don't think we want to include it in Cincinnati. I don't care if folks want to rewrite it in Rust or not. I can also put it in a separate Python script, instead of having it live in a subcommand of the existing script, if that would make folks more comfortable. Thoughts?

@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 12, 2020
@wking
Copy link
Member Author

wking commented Mar 12, 2020

Rebased again with ab9d6aa -> 15c6d24.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2020
wking added 17 commits April 17, 2021 15:48
As described in 17232e7 (Blocking edges to 4.3.11, 2020-04-13, openshift#170).
We didn't want the Quay load of many clusters updating to the new
release (which usually happens after fast/stable promotion) while the
Quay folks were sorting out some capacity issues, so we decided to
tombstone these releases and wait for the next week's.
  count by (internal,channel) (
    id_version_ebs_account_internal:cluster_subscribed
    + on (_id) group_left (channel)
    topk by (_id) (1, cluster_version_available_updates)
    + on (_id) group_left (blah)
    group by (_id) (cluster_version{type="completed",version="4.7.2",from_version="4.7.1"})
  )

and similar for other releases show that we get the bulk of
fast-channel updates within the first 48h of adding a release to the
fast channels.  Slow down stable a bit so we get that
Insights/Telemetry feeback before we add a new release to stable.
I forget why, but for some reason we didn't like this one and we
recylced its errata for 4.4.23:

  $ curl -sH accept:application/json 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=candidate-4.4' | jq -r '.nodes[] | select(.version == "4.4.22" or .version == "4.4.23") | .version + " " + .metadata.url'
  4.4.23 https://access.redhat.com/errata/RHBA-2020:3715
  4.4.22 https://access.redhat.com/errata/RHBA-2020:3715
I forget why, but for some reason we didn't like this one and we
recylced its errata for 4.4.26:

  $ curl -sH accept:application/json 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=candidate-4.4' | jq -r '.nodes[] | select(.metadata.url == "https://access.redhat.com/errata/RHBA-2020:3764").version' | sort -V
  4.4.24
  4.4.25
  4.4.26
I forget why, but for some reason we didn't like this one and we
recylced its errata for 4.4.29:

  $ curl -sH accept:application/json 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=candidate-4.4' | jq -r '.nodes[] | select(.metadata.url == "https://access.redhat.com/errata/RHBA-2020:4224").version' | sort -V
  4.4.28
  4.4.29
I forget why, but for some reason we didn't like this one and we
recylced its errata for 4.5.1:

  $ curl -sH accept:application/json 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=candidate-4.5' | jq -r '.nodes[] | select(.metadata.url == "https://access.redhat.com/errata/RHBA-2020:2409").version' | sort -V
  4.5.0
  4.5.0-rc.1
  ...
  4.5.0-rc.7
  4.5.1
  4.5.1-rc.0

No need to tombstone the RCs, because they don't match the
.feeder.filter regexp.
We've been evolving our hotfix process, and this one has no errata and
was never promoted into fast.  No need to promote it now.

  $ curl -sH accept:application/json 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=candidate-4.5' | jq -r '.nodes[] | select(.version == "4.5.0-0.hotfix-2020-11-28-021842")'
  {
    "version": "4.5.0-0.hotfix-2020-11-28-021842",
    "payload": "quay.io/openshift-release-dev/ocp-release@sha256:d49f94e6c6350b936f073ad91d9121f64012cebded6a9c059184f84997a9e621",
    "metadata": {
      "io.openshift.upgrades.graph.release.channels": "candidate-4.5,candidate-4.6",
      "io.openshift.upgrades.graph.release.manifestref": "sha256:d49f94e6c6350b936f073ad91d9121f64012cebded6a9c059184f84997a9e621"
    }
  }
I forget why, but for some reason we didn't like these and recycled
their errata:

  $ curl -sH accept:application/json 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=candidate-4.5' | jq -r '.nodes as $n | [$n[].metadata.url] | unique[] | . as $url
| ([$n[] | select(.metadata.url == $url).version]) as $versions | select(($versions | length) > 1) | $url + " " + ($versions | join(" "))' | sort
   4.5.0-0.hotfix-2020-11-28-021842 4.5.0-0.hotfix-2020-08-24-185832
  https://access.redhat.com/errata/RHBA-2020:0581 4.4.0-rc.11 4.4.0-rc.13 4.4.0-rc.12 4.4.0-rc.2 4.4.0-rc.0 4.4.0-rc.1 4.4.0 4.4.0-rc.4 4.4.3 4.4.2 4.4.0-rc.6 4.4.0-rc.7 4.4.0-rc.8 4.4.0-rc.9 4.4.0-rc.10
  https://access.redhat.com/errata/RHBA-2020:2409 4.5.0-rc.4 4.5.0-rc.5 4.5.0-rc.2 4.5.0-rc.6 4.5.0-rc.7 4.5.1 4.5.1-rc.0 4.5.0 4.5.0-rc.1
  https://access.redhat.com/errata/RHBA-2020:3715 4.4.22 4.4.23
  https://access.redhat.com/errata/RHBA-2020:3719 4.5.10 4.5.11
  https://access.redhat.com/errata/RHBA-2020:3760 4.5.13 4.5.12
  https://access.redhat.com/errata/RHBA-2020:3764 4.4.25 4.4.26 4.4.24
  https://access.redhat.com/errata/RHBA-2020:4224 4.4.29 4.4.28
  https://access.redhat.com/errata/RHBA-2021:0033 4.5.25 4.5.27
  https://access.redhat.com/errata/RHBA-2021:0231 4.5.30 4.5.29
I forget why, but for some reason we didn't like these and recycled
their errata:

  $ curl -sH accept:application/json 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=candidate-4.6' | jq -r '.nodes as $n | [$n[].metadata.url] | unique[] | . as $url | ([$n[] | select(.metadata.url == $url).version] | sort) as $versions | select(($versions | length) > 1) | $url + " " + ($versions | join(" "))' | sort | grep '4\.6\.[02]'
   4.5.0-0.hotfix-2020-08-24-185832 4.5.0-0.hotfix-2020-11-28-021842 4.6.0-fc.0 4.6.0-fc.1 4.6.0-fc.2 4.6.0-fc.3 4.6.0-fc.4 4.6.0-fc.5 4.6.0-fc.7 4.6.0-fc.8 4.6.5 4.6.6
  https://access.redhat.com/errata/RHBA-2020:4196 4.6.0 4.6.0-rc.0 4.6.0-rc.1 4.6.0-rc.2 4.6.0-rc.3 4.6.0-rc.4 4.6.1
  https://access.redhat.com/errata/RHBA-2020:4339 4.6.2 4.6.3
These were supposed to be tombstoned in candidate based on recycled
errata [1,2].  But they snuck into fast-4.6 with afa003f (Populate
4.5.z into candidate-4.6 and fast-4.6, 2020-11-05, openshift#526).  And then I
moved them from there into fast-4.5 with 1d9fe0f (channels: More
backfilling (e.g. 4.3.0 in candidate-4.4), 2020-11-12, openshift#549).  And
they went into stable-4.6 with 1fb8768 (Add 4.5 to 4.6 upgrades to
stable-4.6, 2020-12-14, openshift#585).  This commit completes their accidental
journey, since tombstoning them in fast would violate [3]:

  While the fast-4.5 channel contains releases as soon as their errata
  are published, releases are added to the stable-4.5 channel after a
  delay

[1]: openshift#438 (comment)
[2]: openshift#456 (comment)
[3]: https://docs.openshift.com/container-platform/4.5/updating/updating-cluster-between-minor.html#stable-4-5-channel
@wking
Copy link
Member Author

wking commented Apr 17, 2021

Rebased around #759 to cover the new 4.8 channels with 2dc94b4 -> 8265e89.

- 4.5.7
- 4.5.8
- 4.5.9
- 4.5.10

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're retroactively adding those to stable - is this intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, see discussion in 845ac82.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intended - 845ac82

@LalatenduMohanty
Copy link
Member

/hold Just wanted to make sure I take a final look before we merge this.

@LalatenduMohanty LalatenduMohanty removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 28, 2021
@LalatenduMohanty
Copy link
Member

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2021
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 28, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, vrutkovs, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [LalatenduMohanty,vrutkovs,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 2008f8e into openshift:master Apr 28, 2021
@wking wking deleted the auto-stabilize branch April 28, 2021 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants