fix: improve publish-crate.sh logging and rate limit handling#11178
fix: improve publish-crate.sh logging and rate limit handling#11178mircea-c wants to merge 4 commits intoanza-xyz:masterfrom
Conversation
- Use crate name instead of Cargo.toml path in Buildkite section headers - Drop set -x from the publish subshell to reduce log noise - Add a 60s inter-crate sleep (burst of 30, then 1/min per crates.io limits) - Detect HTTP 429 responses and back off exponentially on retries - Add pipefail so the cargo publish | tee pipeline propagates failures Fixes anza-xyz#11085 Fixes anza-xyz#11087
e3c58f2 to
f11f55a
Compare
Co-authored-by: Mykola Dzham <i@levsha.me>
|
|
||
| for Cargo_toml in $Cargo_tomls; do | ||
| echo "--- $Cargo_toml" | ||
| crate_name=$(grep -m 1 '^name = ' "$Cargo_toml" | cut -f 3 -d ' ' | tr -d \") |
There was a problem hiding this comment.
is it time to put cargo install toml-cli in the ci images? then this becomes...
toml get -r "$Cargo_toml" package.nameThere was a problem hiding this comment.
tho i think the worst offender is the path to the docker runner. assuming there's something stupid in there like echo "--- $0"
There was a problem hiding this comment.
fuckin' nailed it
Lines 52 to 54 in 92d55b2
There was a problem hiding this comment.
docker-run.sh emits echo "--- $0" and causing the duplicate collapsed sections. Should we start there by removing that?
There was a problem hiding this comment.
Right, I commented before refresh and didn't see your find. I don't want to make the change in this PR, but I can open a new one for that.
|
did you test the changes with a dry run? |
I have not, cuz it's a ton of work. I did sort of unit tests of the logic changes |
|
|
||
| # crates.io allows a burst of 30 new versions, then 1 per minute. | ||
| # Sleep between each crate to stay within the sustained rate limit. | ||
| sleep 60 |
There was a problem hiding this comment.
do we really need this? the manual sleep feels a bit too strict to me.
https://crates.io/docs/rate-limits
... omit ...
Concretely, the rate limits for crates.io are:
For brand new crate publishes, allow a burst of 5 new crates published at once by a single user account, with a rate limit of 1 crate every 10 minutes allowed after that burst
For new versions of existing crates, allow a burst of 30 new versions published at once by a single user account, with a rate limit of 1 crate per minute allowed after that burst... omit ...
we already have a retry mechanism and each crates take ~1m to build. this should naturally respect the rate limit. also we have ~133 crates to publish, with this change, this will likely add at least 2h to the total job time.
Problem
Two issues with the
publish-cratesstep on solana-secondary:Cargo.tomlpath rather than the crate name, making collapsed sections appear identical when truncated (ci: solana-secondary publish-crates step logs difficult to follow #11087)set -xfloods the log with trace output, making it impossible to follow while running (ci: solana-secondary publish-crates step logs difficult to follow #11087)Summary of Changes
Cargo.tomlpath in Buildkite section headersset -xfrom the publish subshell to reduce log noiseCaused by: ... status 429output and back off exponentially on retries instead of using a flat 3s delaypipefailso thecargo publish | teepipeline correctly propagates failuresFixes #11085
Fixes #11087