Expanding `llm_parse_json` and removing `extract_score` assumptions by jamesbraza · Pull Request #1082 · Future-House/paper-qa

jamesbraza · 2025-09-10T19:37:41Z

This PR makes our JSON extraction more robust and with less assumptions

llm_parse_json was commonly failing on missing commas in the JSON
llm_parse_json had extra logic
extract_score would inject two fake scores
- Score of 1 if short context
- Score of 5 if failed extraction

dosubot · 2025-09-10T19:38:03Z

Related Documentation

Checked 1 published document(s). No updates required.

^{How did I do? Any feedback?}

Copilot

Pull Request Overview

This PR improves the robustness of JSON extraction and removes hardcoded fallback scores from the extract_score function. The changes focus on making JSON parsing more fault-tolerant and ensuring proper error handling when score extraction fails.

Key changes:

Enhanced llm_parse_json function with better comma handling and error reporting
Modified extract_score to raise exceptions instead of returning fallback scores
Added comprehensive test coverage for missing comma scenarios in JSON parsing

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
src/paperqa/core.py	Enhanced JSON parsing with missing comma detection and improved error handling
src/paperqa/utils.py	Removed fallback score assumptions, now raises ValueError on extraction failure
tests/test_paperqa.py	Updated tests to expect exceptions and added new test case for missing commas

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-10T19:38:10Z


    # Handling float, str values for relevance_score
-    if "relevance_score" in data:
+    if "relevance_score" in data and not isinstance(data, int):


The condition not isinstance(data, int) is checking the wrong variable. It should check data['relevance_score'] instead of data itself, since data is a dictionary.

Suggested change

if "relevance_score" in data and not isinstance(data, int):

if "relevance_score" in data and not isinstance(data["relevance_score"], int):

cursor

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, please have a team admin upgrade your team to Bugbot Pro by visiting the Cursor dashboard. Your first 14 days will be free!

Comment @cursor review or bugbot run to trigger another review on this PR

whitead

Nice work - unclear if test failures are real though.

Also - this code will now blow-up on parsing failures? This is a big change right?

Also - can you review the prompts and make sure that an empty response or "not relevant" is not specified as the correct behavior for irrelevant contexts? I seem to remember that some of this code assumed an empty response or something like it was preferred over JSON for irrelevant contexts.

whitead

Nice work!

jamesbraza · 2025-09-11T19:45:44Z

Nice work - unclear if test failures are real though.

They were real, there was an issue that Cursorbot caught, and I've since fixed that. Now tests pass.

Also - this code will now blow-up on parsing failures? This is a big change right?

Thanks to #1083, we now can retry getting the context if there's a JSON parse or a score extraction failure. And if the LLM fails twice, we safely abandon the context.

So now, if the LLM fails to provide a valid score twice, the context will be abandoned.

Also - can you review the prompts and make sure that an empty response or "not relevant" is not specified as the correct behavior for irrelevant contexts? I seem to remember that some of this code assumed an empty response or something like it was preferred over JSON for irrelevant contexts.

In summary_prompt we mention replying with "Not applicable": https://github.com/Future-House/paper-qa/blob/v5.29.1/src/paperqa/prompts.py#L10
We still have the failover to 0 score in extract_score, I had left that in-tact: https://github.com/Future-House/paper-qa/blob/v5.29.1/src/paperqa/utils.py#L123-L129

I think this is what you're getting at, so we should be good here.

dosubot · 2025-09-11T19:46:04Z

Documentation Updates

Checked 1 published document(s). No updates required.

^{How did I do? Any feedback?}

jamesbraza requested review from SamCox822, maykcaldas, mskarlin, sidnarayanan and whitead September 10, 2025 19:37

jamesbraza self-assigned this Sep 10, 2025

jamesbraza added the bug Something isn't working label Sep 10, 2025

Copilot AI review requested due to automatic review settings September 10, 2025 19:37

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Sep 10, 2025

Copilot AI reviewed Sep 10, 2025

View reviewed changes

cursor bot reviewed Sep 10, 2025

View reviewed changes

Comment thread src/paperqa/core.py Outdated

whitead reviewed Sep 10, 2025

View reviewed changes

jamesbraza force-pushed the llm-parse-edge-cases branch from 8d35d46 to 943cadc Compare September 10, 2025 20:23

whitead self-requested a review September 10, 2025 22:58

whitead approved these changes Sep 10, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 10, 2025

jamesbraza added 4 commits September 11, 2025 12:18

Removed assumptions of 1 or 5 scores from extract_score

57be0d1

Handled missing comma after 'summary' field case

9540164

Cleaned up llm_parse_json a bit

0217ce3

Putting a number in context returns so test_custom_llm can pass

6f18abe

jamesbraza force-pushed the llm-parse-edge-cases branch from 99d318a to 6f18abe Compare September 11, 2025 19:24

jamesbraza merged commit 3ef7c66 into main Sep 11, 2025
4 of 5 checks passed

jamesbraza deleted the llm-parse-edge-cases branch September 11, 2025 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expanding `llm_parse_json` and removing `extract_score` assumptions#1082

Expanding `llm_parse_json` and removing `extract_score` assumptions#1082
jamesbraza merged 4 commits intomainfrom
llm-parse-edge-cases

jamesbraza commented Sep 10, 2025

Uh oh!

dosubot bot commented Sep 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 10, 2025

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

whitead left a comment

Uh oh!

whitead left a comment

Uh oh!

jamesbraza commented Sep 11, 2025

Uh oh!

Uh oh!

dosubot bot commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if "relevance_score" in data and not isinstance(data, int):
	if "relevance_score" in data and not isinstance(data["relevance_score"], int):

Conversation

jamesbraza commented Sep 10, 2025

Uh oh!

dosubot bot commented Sep 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

Uh oh!

whitead left a comment

Choose a reason for hiding this comment

Uh oh!

whitead left a comment

Choose a reason for hiding this comment

Uh oh!

jamesbraza commented Sep 11, 2025

Uh oh!

Uh oh!

dosubot bot commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants