Added del to save memory during reading#1265
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds explicit del statements to free image objects earlier in the PDF reading process to reduce RAM usage. The changes target image-related objects (pdfium bitmaps, PIL images, and numpy arrays) that are no longer needed after extraction or rendering.
Key Changes:
- Added
delstatements to free pdfium bitmap objects immediately after they're converted or used - Added
delfor intermediate numpy array used for API calls once processing is complete - Added
delfor cropped PIL images after they're saved to BytesIO buffers
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| packages/paper-qa-pypdf/src/paperqa_pypdf/reader.py | Frees pdfium bitmap memory for full-page renders, clustered image regions, and table regions immediately after conversion to PNG format |
| packages/paper-qa-nemotron/src/paperqa_nemotron/reader.py | Frees pdfium bitmap memory after PIL conversion, numpy array memory after API calls complete, and cropped image memory after saving to buffer |
After thorough review, all the deletions are safe and correctly placed. The variables are deleted only after their last usage, and any derived objects (like PIL images created from pdfium bitmaps) are retained as needed. The comments accurately describe the memory being freed. This is a clean optimization that should help reduce memory usage during PDF processing without introducing any bugs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Our readers are taking up too much RAM. This PR attempts to remove images from the readers early, to save overall RAM
Note
Reduces peak RAM during PDF parsing by freeing large intermediate image buffers as soon as they’re no longer needed.
paperqa_nemotron/reader.py: deleterendered_pageafter conversion, deleteimage_for_apipost-API call, and delete cropped regions (region_pix) after savingpaperqa_pypdf/reader.py: delete pdfium bitmaps (pdfium_rendered_page,pix) after PIL conversion/saving for full-page screenshots, figure crops, and table cropsWritten by Cursor Bugbot for commit db16c88. Configure here.