Skip to content

Not having ID break index cache key across Python invocations#1262

Merged
jamesbraza merged 3 commits intomainfrom
fixing-parse-serde
Jan 5, 2026
Merged

Not having ID break index cache key across Python invocations#1262
jamesbraza merged 3 commits intomainfrom
fixing-parse-serde

Conversation

@jamesbraza
Copy link
Copy Markdown
Collaborator

#1125 differentiated cache keys across PDF parse functions, but inadvertently exposed us to this:

from paperqa_pypdf import parse_pdf_to_pages

print(str(parse_pdf_to_pages))  # '<function parse_pdf_to_pages at 0x1037180e0>'

Namely, the ID shown in the str representation is breaking the cache key across Python invocations 🫠. This PR:

  • Uses a more stable FQN for this path
  • Also handles lambda and functools.partial

Closes #1257

@jamesbraza jamesbraza self-assigned this Jan 3, 2026
Copilot AI review requested due to automatic review settings January 3, 2026 19:02
@jamesbraza jamesbraza added the bug Something isn't working label Jan 3, 2026
@jamesbraza jamesbraza changed the title Not have ID break index cache key across Python invocations Not having ID break index cache key across Python invocations Jan 3, 2026
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jan 3, 2026
@dosubot
Copy link
Copy Markdown

dosubot bot commented Jan 3, 2026

Related Documentation

Checked 1 published document(s) in 0 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes cache key instability caused by function memory addresses appearing in string representations across Python invocations. The fix introduces a new utility function get_stable_str() that generates stable string representations for functions, handling special cases like lambdas and functools.partial objects.

Key changes:

  • Added get_stable_str() utility function to generate stable function identifiers using fully qualified names (FQN) for normal functions and code hashes for lambdas/partials
  • Updated ParsingSettings._custom_serializer() to use get_stable_str() for JSON-safe serialization
  • Modified Settings.get_index_name() to use get_stable_str() for stable cache key generation
  • Added test cases for lambda and partial function handling

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
src/paperqa/utils.py Added get_stable_str() function to generate stable string representations for functions, avoiding memory address issues in cache keys
src/paperqa/settings.py Updated serialization and index name generation to use get_stable_str() instead of direct string conversion
tests/test_paperqa.py Added test stubs and test cases to verify lambda and partial functions don't break serialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 5, 2026
@jamesbraza jamesbraza merged commit 6625d52 into main Jan 5, 2026
7 checks passed
@jamesbraza jamesbraza deleted the fixing-parse-serde branch January 5, 2026 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pqa tries to index old files every time during 'ask' and if --agent.rebuild_index false, it can't find the index

3 participants