Retrying `nemotron-parse` API calls receiving 408 timeouts by jamesbraza · Pull Request #1276 · Future-House/paper-qa

jamesbraza · 2026-01-27T21:42:44Z

When serving nemotron-parse using vllm==0.14.1 on Modal with an Nvidia H200 with a FastAPI proxy to make vLLM match Nvidia NIM, we see 408 status codes coming back sometimes.

litellm==1.81.1 casts these 408 to litellm.exceptions.TimeoutError: https://github.com/BerriAI/litellm/blob/v1.81.1-nightly/litellm/exceptions.py#L243

This PR moves us to also retry those 408 errors.

Note

Expands retry logic for _call_nvidia_api to handle 408 Request Timeout surfaced by LiteLLM.

Adds _is_litellm_timeout_with_408 and updates Tenacity decorator to retry_if_exception(_is_litellm_timeout_with_408) alongside TimeoutError
Imports http and retry_if_exception to support new condition

^{Written by Cursor Bugbot for commit 954774a. Configure here.}

Copilot

Pull request overview

This PR adds retry logic to handle 408 (Request Timeout) status codes from the nemotron-parse API when served via vLLM on Modal. According to the PR description, litellm version 1.81.1 casts these 408 responses to litellm.exceptions.Timeout exceptions, and the new retry logic ensures these timeout errors are retried along with the existing rate limit timeouts.

Changes:

Added a helper function to detect litellm timeout exceptions with 408 status codes
Extended the retry decorator on _call_nvidia_api to retry on inference timeouts (408) in addition to rate limit timeouts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/paper-qa-nemotron/src/paperqa_nemotron/api.py

Retrying API calls receiving 408 timeouts

954774a

jamesbraza requested review from MicPie, mskarlin, sidnarayanan and whitead January 27, 2026 21:42

jamesbraza self-assigned this Jan 27, 2026

jamesbraza added the bug Something isn't working label Jan 27, 2026

Copilot AI review requested due to automatic review settings January 27, 2026 21:42

jamesbraza force-pushed the retrying-timeout branch from 17a23b3 to 954774a Compare January 27, 2026 21:42

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jan 27, 2026

Copilot started reviewing on behalf of jamesbraza January 27, 2026 21:43 View session

dosubot bot added the enhancement New feature or request label Jan 27, 2026

Copilot AI reviewed Jan 27, 2026

View reviewed changes

packages/paper-qa-nemotron/src/paperqa_nemotron/api.py Show resolved Hide resolved

packages/paper-qa-nemotron/src/paperqa_nemotron/api.py Show resolved Hide resolved

sidnarayanan approved these changes Jan 27, 2026

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 27, 2026

jamesbraza merged commit 1a6ed3f into main Jan 27, 2026
12 of 14 checks passed

jamesbraza deleted the retrying-timeout branch January 27, 2026 21:52

jamesbraza mentioned this pull request Jan 27, 2026

Fixing 60-sec wait for retrying nemotron-parse API calls' 408 timeouts #1277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrying `nemotron-parse` API calls receiving 408 timeouts#1276

Retrying `nemotron-parse` API calls receiving 408 timeouts#1276
jamesbraza merged 1 commit intomainfrom
retrying-timeout

jamesbraza commented Jan 27, 2026 •

edited by cursor bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jamesbraza commented Jan 27, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jamesbraza commented Jan 27, 2026 •

edited by cursor bot

Loading