Skip to content

Added del to save memory during reading#1265

Merged
jamesbraza merged 2 commits intomainfrom
memory-savings-nemotron-parse
Jan 11, 2026
Merged

Added del to save memory during reading#1265
jamesbraza merged 2 commits intomainfrom
memory-savings-nemotron-parse

Conversation

@jamesbraza
Copy link
Copy Markdown
Collaborator

@jamesbraza jamesbraza commented Jan 6, 2026

Our readers are taking up too much RAM. This PR attempts to remove images from the readers early, to save overall RAM


Note

Reduces peak RAM during PDF parsing by freeing large intermediate image buffers as soon as they’re no longer needed.

  • In paperqa_nemotron/reader.py: delete rendered_page after conversion, delete image_for_api post-API call, and delete cropped regions (region_pix) after saving
  • In paperqa_pypdf/reader.py: delete pdfium bitmaps (pdfium_rendered_page, pix) after PIL conversion/saving for full-page screenshots, figure crops, and table crops
  • Changes are localized to memory cleanup with no API or behavioral surface changes

Written by Cursor Bugbot for commit db16c88. Configure here.

@jamesbraza jamesbraza self-assigned this Jan 6, 2026
Copilot AI review requested due to automatic review settings January 6, 2026 22:09
@jamesbraza jamesbraza added the bug Something isn't working label Jan 6, 2026
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jan 6, 2026
@dosubot
Copy link
Copy Markdown

dosubot bot commented Jan 6, 2026

Related Documentation

Checked 1 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@dosubot dosubot bot added the enhancement New feature or request label Jan 6, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds explicit del statements to free image objects earlier in the PDF reading process to reduce RAM usage. The changes target image-related objects (pdfium bitmaps, PIL images, and numpy arrays) that are no longer needed after extraction or rendering.

Key Changes:

  • Added del statements to free pdfium bitmap objects immediately after they're converted or used
  • Added del for intermediate numpy array used for API calls once processing is complete
  • Added del for cropped PIL images after they're saved to BytesIO buffers

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
packages/paper-qa-pypdf/src/paperqa_pypdf/reader.py Frees pdfium bitmap memory for full-page renders, clustered image regions, and table regions immediately after conversion to PNG format
packages/paper-qa-nemotron/src/paperqa_nemotron/reader.py Frees pdfium bitmap memory after PIL conversion, numpy array memory after API calls complete, and cropped image memory after saving to buffer

After thorough review, all the deletions are safe and correctly placed. The variables are deleted only after their last usage, and any derived objects (like PIL images created from pdfium bitmaps) are retained as needed. The comments accurately describe the memory being freed. This is a clean optimization that should help reduce memory usage during PDF processing without introducing any bugs.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 11, 2026
@jamesbraza jamesbraza merged commit 68c66ce into main Jan 11, 2026
20 of 21 checks passed
@jamesbraza jamesbraza deleted the memory-savings-nemotron-parse branch January 11, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants