Fixing 60-sec wait for retrying `nemotron-parse` API calls' 408 timeouts by jamesbraza · Pull Request #1277 · Future-House/paper-qa

jamesbraza · 2026-01-27T22:34:41Z

#1276 incorrectly grouped the timeout errors, leading to an unnecessary 60-sec retry wait in the 408 timeout error case

Note

Adjusts _call_nvidia_api retry strategy to avoid unnecessary backoff on inference timeouts.

Separates 408 (LiteLLM Timeout with status 408) into its own @retry without exponential wait
Keeps exponential backoff only for rate-limit TimeoutError and preserves retries for NemotronBBoxError

This prevents the previous ~60s wait when retrying 408 errors while maintaining proper backoff for rate limits.

^{Written by Cursor Bugbot for commit 8218ef9. Configure here.}

Copilot

Pull request overview

Adjusts retry behavior for nemotron-parse API calls so LiteLLM-surfaced 408 timeouts don’t incur the long (60s+) backoff intended for rate-limiter timeouts.

Changes:

Splits the retry logic for _call_nvidia_api into separate Tenacity retries: one for TimeoutError (with exponential backoff) and one for LiteLLM 408 timeouts (without the long backoff).
Introduces/uses _is_litellm_timeout_with_408 as the predicate for identifying 408 timeouts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T22:38:26Z

packages/paper-qa-nemotron/src/paperqa_nemotron/api.py

 @retry(
-    retry=(
-        retry_if_exception_type(TimeoutError)  # Hitting rate limits
-        | retry_if_exception(_is_litellm_timeout_with_408)  # Inference timeout
-    ),
+    retry=retry_if_exception_type(TimeoutError),  # Hitting rate limits
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=2, min=GLOBAL_RATE_LIMITER_TIMEOUT),
    before_sleep=before_sleep_log(logger, logging.WARNING),
 )
+@retry(
+    retry=retry_if_exception(_is_litellm_timeout_with_408),  # Inference timeout
+    stop=stop_after_attempt(3),
+    before_sleep=before_sleep_log(logger, logging.WARNING),
+)


Stacking multiple tenacity @retry decorators makes the retry semantics hard to reason about and can lead to more attempts and/or waits than intended (e.g., an exception that satisfies both retry predicates can be retried by the inner decorator and then again by the outer one, potentially re-introducing the long wait_exponential(..., min=GLOBAL_RATE_LIMITER_TIMEOUT) delay you’re trying to avoid for 408s). Consider collapsing this back to a single @retry with a combined retry= predicate and a wait= callable that chooses the backoff based on the caught exception (or otherwise ensure the TimeoutError retry explicitly excludes the 408 case).

Not a bad idea, but YAGNI for now

packages/paper-qa-nemotron/src/paperqa_nemotron/api.py

Not waiting 60-sec to retry litellm.exceptions.Timeout 408

8218ef9

jamesbraza self-assigned this Jan 27, 2026

Copilot AI review requested due to automatic review settings January 27, 2026 22:34

jamesbraza added the bug Something isn't working label Jan 27, 2026

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jan 27, 2026

Copilot started reviewing on behalf of jamesbraza January 27, 2026 22:35 View session

jamesbraza merged commit 98100c3 into main Jan 27, 2026
10 of 12 checks passed

jamesbraza deleted the retrying-timeout-better branch January 27, 2026 22:35

Copilot AI reviewed Jan 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing 60-sec wait for retrying `nemotron-parse` API calls' 408 timeouts#1277

Fixing 60-sec wait for retrying `nemotron-parse` API calls' 408 timeouts#1277
jamesbraza merged 1 commit intomainfrom
retrying-timeout-better

jamesbraza commented Jan 27, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

jamesbraza Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jamesbraza commented Jan 27, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

jamesbraza Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jamesbraza commented Jan 27, 2026 •

edited by cursor bot

Loading