Skip to content

fix: improve publish-crate.sh logging and rate limit handling#11178

Open
mircea-c wants to merge 4 commits intoanza-xyz:masterfrom
mircea-c:fix/publish-crate-rate-limits
Open

fix: improve publish-crate.sh logging and rate limit handling#11178
mircea-c wants to merge 4 commits intoanza-xyz:masterfrom
mircea-c:fix/publish-crate-rate-limits

Conversation

@mircea-c
Copy link

@mircea-c mircea-c commented Mar 10, 2026

Problem

Two issues with the publish-crates step on solana-secondary:

  1. Buildkite section headers show the full Cargo.toml path rather than the crate name, making collapsed sections appear identical when truncated (ci: solana-secondary publish-crates step logs difficult to follow #11087)
  2. set -x floods the log with trace output, making it impossible to follow while running (ci: solana-secondary publish-crates step logs difficult to follow #11087)
  3. Retries use a flat 3s sleep with no awareness of crates.io rate limits. crates.io allows a burst of 30 new versions then 1 per minute; Agave publishes hundreds of crates per release so the burst is always exhausted (ci: solana-secondary publish-crates step does not respect rate limits #11085)

Summary of Changes

  • Use crate name instead of Cargo.toml path in Buildkite section headers
  • Drop set -x from the publish subshell to reduce log noise
  • Add a 60s inter-crate sleep to stay within the sustained rate limit
  • Detect HTTP 429 responses via cargo's Caused by: ... status 429 output and back off exponentially on retries instead of using a flat 3s delay
  • Add pipefail so the cargo publish | tee pipeline correctly propagates failures

Fixes #11085
Fixes #11087

@mircea-c mircea-c self-assigned this Mar 10, 2026
@mircea-c mircea-c marked this pull request as ready for review March 10, 2026 19:03
@mircea-c mircea-c requested review from a team and t-nelson March 10, 2026 19:03
- Use crate name instead of Cargo.toml path in Buildkite section headers
- Drop set -x from the publish subshell to reduce log noise
- Add a 60s inter-crate sleep (burst of 30, then 1/min per crates.io limits)
- Detect HTTP 429 responses and back off exponentially on retries
- Add pipefail so the cargo publish | tee pipeline propagates failures

Fixes anza-xyz#11085
Fixes anza-xyz#11087
@mircea-c mircea-c force-pushed the fix/publish-crate-rate-limits branch from e3c58f2 to f11f55a Compare March 10, 2026 19:10
@mircea-c mircea-c changed the title fix: respect crates.io rate limits in publish-crate.sh fix: improve publish-crate.sh logging and rate limit handling Mar 10, 2026
mircea-c and others added 2 commits March 10, 2026 15:58
@mircea-c mircea-c requested a review from levsha March 10, 2026 20:13

for Cargo_toml in $Cargo_tomls; do
echo "--- $Cargo_toml"
crate_name=$(grep -m 1 '^name = ' "$Cargo_toml" | cut -f 3 -d ' ' | tr -d \")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it time to put cargo install toml-cli in the ci images? then this becomes...

toml get -r "$Cargo_toml" package.name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tho i think the worst offender is the path to the docker runner. assuming there's something stupid in there like echo "--- $0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuckin' nailed it

agave/ci/docker-run.sh

Lines 52 to 54 in 92d55b2

echo "--- $0 ... (with sccache being DISABLED due to many (${BUILDKITE_RETRY_COUNT}) retries)"
else
echo "--- $0 ... (with sccache enabled with prefix: $SCCACHE_KEY_PREFIX)"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker-run.sh emits echo "--- $0" and causing the duplicate collapsed sections. Should we start there by removing that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I commented before refresh and didn't see your find. I don't want to make the change in this PR, but I can open a new one for that.

@mircea-c mircea-c requested a review from t-nelson March 10, 2026 22:01
@t-nelson
Copy link

did you test the changes with a dry run?

@mircea-c
Copy link
Author

did you test the changes with a dry run?

I have not, cuz it's a ton of work. I did sort of unit tests of the logic changes


# crates.io allows a burst of 30 new versions, then 1 per minute.
# Sleep between each crate to stay within the sustained rate limit.
sleep 60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this? the manual sleep feels a bit too strict to me.

https://crates.io/docs/rate-limits

... omit ...

Concretely, the rate limits for crates.io are:

For brand new crate publishes, allow a burst of 5 new crates published at once by a single user account, with a rate limit of 1 crate every 10 minutes allowed after that burst
For new versions of existing crates, allow a burst of 30 new versions published at once by a single user account, with a rate limit of 1 crate per minute allowed after that burst

... omit ...

we already have a retry mechanism and each crates take ~1m to build. this should naturally respect the rate limit. also we have ~133 crates to publish, with this change, this will likely add at least 2h to the total job time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: solana-secondary publish-crates step logs difficult to follow ci: solana-secondary publish-crates step does not respect rate limits

4 participants