Fixing flaky PaSa figure 1 read assertions#1312
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses flaky PDF figure-question assertions by updating the tests to accept multiple valid phrasings/orderings in the model’s extracted answer for the PaSa Figure 1 question.
Changes:
- Refactors per-question assertions from substring-count tuples into a more general “answer checks” list supporting regex checks.
- Updates the “User Query blue boxes” question to use a regex intended to be order-independent for “paper queue” vs “selector”.
- Applies the same assertion refactor across pypdf, pymupdf, docling, and nemotron test suites.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| packages/paper-qa-pypdf/tests/test_paperqa_pypdf.py | Refactors answer assertions to allow regex-based checks for the figure Q&A test. |
| packages/paper-qa-pymupdf/tests/test_paperqa_pymupdf.py | Mirrors the assertion refactor and regex-based check for the same flaky Q&A scenario. |
| packages/paper-qa-nemotron/tests/test_paperqa_nemotron.py | Applies the same “answer checks” approach to the Nemotron-backed parser tests. |
| packages/paper-qa-docling/tests/test_paperqa_docling.py | Applies the same generalized assertions to the Docling-backed parser tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ( | ||
| "How many User Query blue boxes are there, and what are they connected to?", | ||
| [(("two", "2"), 1), (("crawler", "selector"), 2)], | ||
| [r"two|2|(?=.*paper queue)(?=.*selector)"], |
There was a problem hiding this comment.
The regex two|2|(?=.*paper queue)(?=.*selector) is too permissive: because of the | alternation it will pass if the answer contains just "two" or "2" even if it never mentions both "paper queue" and "selector". If the intent is to require the count and both connections (in either order), split into separate checks (one for count, one requiring both substrings) or use a single pattern that requires all conditions via lookaheads (e.g., require (two|2) plus both substrings).
| [r"two|2|(?=.*paper queue)(?=.*selector)"], | |
| [r"(?=.*\b(two|2)\b)(?=.*paper queue)(?=.*selector)"], |
| ( | ||
| "How many User Query blue boxes are there, and what are they connected to?", | ||
| [(("two", "2"), 1), (("crawler", "selector"), 2)], | ||
| [r"two|2|(?=.*paper queue)(?=.*selector)"], |
There was a problem hiding this comment.
The regex two|2|(?=.*paper queue)(?=.*selector) is too permissive: it will succeed if the answer contains only "two" or "2", without requiring that both "paper queue" and "selector" are present. To keep the test meaningful while allowing either ordering, use two separate checks (count + both substrings) or a lookahead-only pattern that requires all conditions (including the count).
| [r"two|2|(?=.*paper queue)(?=.*selector)"], | |
| [ | |
| (("two", "2"), 1), | |
| (("paper queue", "selector"), 2), | |
| ], |
| ( | ||
| "How many User Query blue boxes are there, and what are they connected to?", | ||
| [(("two", "2"), 1), (("crawler", "selector"), 2)], | ||
| [r"two|2|(?=.*paper queue)(?=.*selector)"], |
There was a problem hiding this comment.
The regex two|2|(?=.*paper queue)(?=.*selector) is logically weaker than the old assertions: because of alternation it will pass on "two"/"2" alone, even if the answer doesn't mention both "paper queue" and "selector". Consider splitting into separate checks (count + presence of both substrings regardless of order) or changing the regex to require all three conditions via lookaheads (including (two|2)).
| [r"two|2|(?=.*paper queue)(?=.*selector)"], | |
| [r"(?=.*\b(?:two|2)\b)(?=.*paper queue)(?=.*selector)"], |
| ( | ||
| "How many User Query blue boxes are there, and what are they connected to?", | ||
| [(("two", "2"), 1), (("crawler", "selector"), 2)], | ||
| [r"two|2|(?=.*paper queue)(?=.*selector)"], |
There was a problem hiding this comment.
The pattern two|2|(?=.*paper queue)(?=.*selector) uses | alternation, so the assertion will pass if the answer merely includes "two"/"2" without mentioning both "paper queue" and "selector". If the goal is to allow either ordering of those two connections while still checking them, split into separate checks (count + both substrings) or make the regex require all conditions (e.g., via lookaheads that include (two|2)).
| [r"two|2|(?=.*paper queue)(?=.*selector)"], | |
| [r"two|2", (("paper queue", "selector"), 2)], |
Seen in this CI run:
This PR generalizes our assertions to allow for the answer saying "paper queue" then "selector" OR "selector" then "paper queue".