Skip to content

Fixing 60-sec wait for retrying nemotron-parse API calls' 408 timeouts#1277

Merged
jamesbraza merged 1 commit intomainfrom
retrying-timeout-better
Jan 27, 2026
Merged

Fixing 60-sec wait for retrying nemotron-parse API calls' 408 timeouts#1277
jamesbraza merged 1 commit intomainfrom
retrying-timeout-better

Conversation

@jamesbraza
Copy link
Copy Markdown
Collaborator

@jamesbraza jamesbraza commented Jan 27, 2026

#1276 incorrectly grouped the timeout errors, leading to an unnecessary 60-sec retry wait in the 408 timeout error case


Note

Adjusts _call_nvidia_api retry strategy to avoid unnecessary backoff on inference timeouts.

  • Separates 408 (LiteLLM Timeout with status 408) into its own @retry without exponential wait
  • Keeps exponential backoff only for rate-limit TimeoutError and preserves retries for NemotronBBoxError

This prevents the previous ~60s wait when retrying 408 errors while maintaining proper backoff for rate limits.

Written by Cursor Bugbot for commit 8218ef9. Configure here.

@jamesbraza jamesbraza self-assigned this Jan 27, 2026
Copilot AI review requested due to automatic review settings January 27, 2026 22:34
@jamesbraza jamesbraza added the bug Something isn't working label Jan 27, 2026
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jan 27, 2026
@jamesbraza jamesbraza merged commit 98100c3 into main Jan 27, 2026
10 of 12 checks passed
@jamesbraza jamesbraza deleted the retrying-timeout-better branch January 27, 2026 22:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts retry behavior for nemotron-parse API calls so LiteLLM-surfaced 408 timeouts don’t incur the long (60s+) backoff intended for rate-limiter timeouts.

Changes:

  • Splits the retry logic for _call_nvidia_api into separate Tenacity retries: one for TimeoutError (with exponential backoff) and one for LiteLLM 408 timeouts (without the long backoff).
  • Introduces/uses _is_litellm_timeout_with_408 as the predicate for identifying 408 timeouts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 366 to +376
@retry(
retry=(
retry_if_exception_type(TimeoutError) # Hitting rate limits
| retry_if_exception(_is_litellm_timeout_with_408) # Inference timeout
),
retry=retry_if_exception_type(TimeoutError), # Hitting rate limits
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=2, min=GLOBAL_RATE_LIMITER_TIMEOUT),
before_sleep=before_sleep_log(logger, logging.WARNING),
)
@retry(
retry=retry_if_exception(_is_litellm_timeout_with_408), # Inference timeout
stop=stop_after_attempt(3),
before_sleep=before_sleep_log(logger, logging.WARNING),
)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stacking multiple tenacity @retry decorators makes the retry semantics hard to reason about and can lead to more attempts and/or waits than intended (e.g., an exception that satisfies both retry predicates can be retried by the inner decorator and then again by the outer one, potentially re-introducing the long wait_exponential(..., min=GLOBAL_RATE_LIMITER_TIMEOUT) delay you’re trying to avoid for 408s). Consider collapsing this back to a single @retry with a combined retry= predicate and a wait= callable that chooses the backoff based on the caught exception (or otherwise ensure the TimeoutError retry explicitly excludes the 408 case).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea, but YAGNI for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants