Skip to content

Non-destructive retrying on nemotron-parse API#1281

Merged
jamesbraza merged 3 commits intomainfrom
better-retrying
Jan 29, 2026
Merged

Non-destructive retrying on nemotron-parse API#1281
jamesbraza merged 3 commits intomainfrom
better-retrying

Conversation

@jamesbraza
Copy link
Copy Markdown
Collaborator

@jamesbraza jamesbraza commented Jan 29, 2026

Previously we could have a destructive interference with retrying in nemotron-parse:

  1. First retry 3X API timeouts
  2. Then retry 3X bounding box errors

This PR tightens up retrying to just retry 3X cumulatively


Note

Medium Risk
Touches external API retry/backoff behavior, which can change throughput and error handling under load; logic is small and covered by a new unit test but still affects request timing.

Overview
Tightens Nvidia nemotron-parse retry behavior so failures across bbox validation errors, rate-limit TimeoutErrors, and LiteLLM 408 timeouts share a single stop_after_attempt(3) budget instead of potentially retrying in multiple stacked phases.

Adds a custom tenacity wait strategy that applies exponential backoff only for the rate-limit TimeoutError path (not for inference-timeout litellm.Timeout), and updates the LiteLLM timeout type check accordingly; includes a focused unit test for the new wait/backoff logic.

Written by Cursor Bugbot for commit 84efd89. Configure here.

@jamesbraza jamesbraza self-assigned this Jan 29, 2026
Copilot AI review requested due to automatic review settings January 29, 2026 22:21
@jamesbraza jamesbraza added the bug Something isn't working label Jan 29, 2026
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jan 29, 2026
@dosubot dosubot bot added the enhancement New feature or request label Jan 29, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts retry behavior for the nemotron-parse Nvidia API calls to avoid “stacked” retry loops that could previously result in more than 3 total retries across different failure modes.

Changes:

  • Combine multiple Tenacity retry decorators into a single retry policy capped at 3 total attempts.
  • Introduce a custom Tenacity wait function to apply exponential backoff only for the TimeoutError (rate-limit) failure mode.
  • Add a unit test validating the conditional backoff behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
packages/paper-qa-nemotron/src/paperqa_nemotron/api.py Consolidates retry logic and adds conditional exponential backoff helper.
packages/paper-qa-nemotron/tests/test_api.py Adds test coverage for the new conditional wait/backoff logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 29, 2026
@jamesbraza jamesbraza merged commit c153af1 into main Jan 29, 2026
5 of 7 checks passed
@jamesbraza jamesbraza deleted the better-retrying branch January 29, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants