Skip to content

Exposed reader_config setting to set DPI and simplify settings#1160

Merged
jamesbraza merged 2 commits intomainfrom
configuring-dpi
Oct 29, 2025
Merged

Exposed reader_config setting to set DPI and simplify settings#1160
jamesbraza merged 2 commits intomainfrom
configuring-dpi

Conversation

@jamesbraza
Copy link
Copy Markdown
Collaborator

This PR allows users to:

  • Configure settings like DPI
  • Simplifies overall setting count in v6

@jamesbraza jamesbraza self-assigned this Oct 29, 2025
Copilot AI review requested due to automatic review settings October 29, 2025 18:32
@jamesbraza jamesbraza added the enhancement New feature or request label Oct 29, 2025
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Oct 29, 2025
@dosubot
Copy link
Copy Markdown

dosubot bot commented Oct 29, 2025

Documentation Updates

Checked 1 published document(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors parsing configuration by introducing a reader_config dictionary field to centralize document reader parameters, while deprecating three existing fields (chunk_size, overlap, pdfs_use_block_parsing). The changes aim to improve API flexibility by allowing arbitrary keyword arguments to be passed to the document reader.

Key changes:

  • Added reader_config dictionary field to ParsingSettings for passing custom parameters to the document reader
  • Implemented deprecation warnings for chunk_size, overlap, and pdfs_use_block_parsing fields with automatic migration to reader_config
  • Updated default values for chunk_chars (3000→5000) and overlap (100→250) in read_doc function

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/paperqa/settings.py Adds reader_config field and deprecation validator to migrate old fields
src/paperqa/docs.py Removes explicit parameter passing in favor of reader_config unpacking
src/paperqa/readers.py Updates default values for chunk_chars and overlap parameters
tests/test_paperqa.py Adds tests for deprecation warnings and config propagation
README.md Documents the new reader_config field
Comments suppressed due to low confidence (1)

src/paperqa/settings.py:973

  • The get_index_name method still uses deprecated fields chunk_size and overlap directly for index name generation. This will cause issues when these fields are removed in version 6. Consider updating to use reader_config values (with fallback to deprecated fields) to ensure consistent index names across the deprecation transition period.
            str(self.parsing.chunk_size),
            str(self.parsing.overlap),

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@maykcaldas maykcaldas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 29, 2025
doc: Doc,
parsed_text_only: bool = False,
include_metadata: bool = False,
chunk_chars: int = 3000,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were lagging our defaults in Settings, so this re-syncs them

@jamesbraza jamesbraza merged commit 167d484 into main Oct 29, 2025
9 checks passed
@jamesbraza jamesbraza deleted the configuring-dpi branch October 29, 2025 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants