ci: retry grype install on transient github 502s#701
Conversation
The anchore install.sh queries github.com for release metadata and intermittently fails with 502 Bad Gateway during high-load periods, which causes the merge-gate E2E job to fail spuriously. Wrap the install with three quadratic-backoff retries (5s, 20s) so a brief CDN blip does not cancel the whole CI step.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Summary
Wrap
tools/setup-toolsgrype install in a quadratic-backoff retry loop so transientgithub.amrom.workers.dev502s during release-metadata lookups do not fail the merge-gate E2E job.Motivation / Context
Recent merge-gate runs have failed spuriously with:
Example: https://github.com/NVIDIA/aicr/actions/runs/25031579265/job/73314217949
The first download (raw.githubusercontent.com
install.sh) succeeds; the script's subsequentgithub.amrom.workers.devrelease-metadata query is what 502s. Withset -ethe step dies immediately on the first failure.This change retries the anchore
install.shup to 3 times with 5s/20s backoff, mirroring the pattern already used inkwok/scripts/run-all-recipes.shandpkg/bundler/deployer/helm/templates/deploy.sh.tmpl.Fixes: N/A
Related: N/A
Type of Change
Component(s) Affected
cmd/aicr,pkg/cli)cmd/aicrd,pkg/api,pkg/server)pkg/recipe)pkg/bundler,pkg/component/*)pkg/collector,pkg/snapshotter)pkg/validator)pkg/errors,pkg/k8s)docs/,examples/)Implementation Notes
GRYPE_TMPbefore exiting on terminal failure.log_warning/log_errorfunctions fromtools/common.Testing
bash -n tools/setup-tools— syntax OK.actionlint/yamllintnot applicable (shell-only change).make qualifywas not run because this is a tooling-script-only change with no Go, YAML, docs-sidebar, or recipe impact.Risk Assessment
Blast radius is limited to the linux grype install path of
tools/setup-tools. macOS install (brew install grype) is unchanged. Happy path (no 502) is identical — no extra latency in the success case. Worst-case added latency on persistent 502: ~25s before failing the same way it does today.Checklist
make qualifyand all checks pass