Skip to content

Expanding llm_parse_json and removing extract_score assumptions#1082

Merged
jamesbraza merged 4 commits intomainfrom
llm-parse-edge-cases
Sep 11, 2025
Merged

Expanding llm_parse_json and removing extract_score assumptions#1082
jamesbraza merged 4 commits intomainfrom
llm-parse-edge-cases

Conversation

@jamesbraza
Copy link
Copy Markdown
Collaborator

This PR makes our JSON extraction more robust and with less assumptions

  1. llm_parse_json was commonly failing on missing commas in the JSON
  2. llm_parse_json had extra logic
  3. extract_score would inject two fake scores
    • Score of 1 if short context
    • Score of 5 if failed extraction

@jamesbraza jamesbraza self-assigned this Sep 10, 2025
@jamesbraza jamesbraza added the bug Something isn't working label Sep 10, 2025
Copilot AI review requested due to automatic review settings September 10, 2025 19:37
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Sep 10, 2025
@dosubot
Copy link
Copy Markdown

dosubot bot commented Sep 10, 2025

Related Documentation

Checked 1 published document(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the robustness of JSON extraction and removes hardcoded fallback scores from the extract_score function. The changes focus on making JSON parsing more fault-tolerant and ensuring proper error handling when score extraction fails.

Key changes:

  • Enhanced llm_parse_json function with better comma handling and error reporting
  • Modified extract_score to raise exceptions instead of returning fallback scores
  • Added comprehensive test coverage for missing comma scenarios in JSON parsing

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/paperqa/core.py Enhanced JSON parsing with missing comma detection and improved error handling
src/paperqa/utils.py Removed fallback score assumptions, now raises ValueError on extraction failure
tests/test_paperqa.py Updated tests to expect exceptions and added new test case for missing commas

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread src/paperqa/core.py Outdated

# Handling float, str values for relevance_score
if "relevance_score" in data:
if "relevance_score" in data and not isinstance(data, int):
Copy link

Copilot AI Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition not isinstance(data, int) is checking the wrong variable. It should check data['relevance_score'] instead of data itself, since data is a dictionary.

Suggested change
if "relevance_score" in data and not isinstance(data, int):
if "relevance_score" in data and not isinstance(data["relevance_score"], int):

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, please have a team admin upgrade your team to Bugbot Pro by visiting the Cursor dashboard. Your first 14 days will be free!

Comment @cursor review or bugbot run to trigger another review on this PR

Comment thread src/paperqa/core.py Outdated
Copy link
Copy Markdown
Collaborator

@whitead whitead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work - unclear if test failures are real though.

Also - this code will now blow-up on parsing failures? This is a big change right?

Also - can you review the prompts and make sure that an empty response or "not relevant" is not specified as the correct behavior for irrelevant contexts? I seem to remember that some of this code assumed an empty response or something like it was preferred over JSON for irrelevant contexts.

@whitead whitead self-requested a review September 10, 2025 22:58
Copy link
Copy Markdown
Collaborator

@whitead whitead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 10, 2025
@jamesbraza
Copy link
Copy Markdown
Collaborator Author

Nice work - unclear if test failures are real though.

They were real, there was an issue that Cursorbot caught, and I've since fixed that. Now tests pass.

Also - this code will now blow-up on parsing failures? This is a big change right?

Thanks to #1083, we now can retry getting the context if there's a JSON parse or a score extraction failure. And if the LLM fails twice, we safely abandon the context.

So now, if the LLM fails to provide a valid score twice, the context will be abandoned.

Also - can you review the prompts and make sure that an empty response or "not relevant" is not specified as the correct behavior for irrelevant contexts? I seem to remember that some of this code assumed an empty response or something like it was preferred over JSON for irrelevant contexts.

  1. In summary_prompt we mention replying with "Not applicable": https://github.com/Future-House/paper-qa/blob/v5.29.1/src/paperqa/prompts.py#L10
  2. We still have the failover to 0 score in extract_score, I had left that in-tact: https://github.com/Future-House/paper-qa/blob/v5.29.1/src/paperqa/utils.py#L123-L129

I think this is what you're getting at, so we should be good here.

@jamesbraza jamesbraza merged commit 3ef7c66 into main Sep 11, 2025
4 of 5 checks passed
@jamesbraza jamesbraza deleted the llm-parse-edge-cases branch September 11, 2025 19:45
@dosubot
Copy link
Copy Markdown

dosubot bot commented Sep 11, 2025

Documentation Updates

Checked 1 published document(s). No updates required.

How did I do? Any feedback?  Join Discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants