Skip to content

Generalizing tests for smarter LLMs#1149

Merged
jamesbraza merged 9 commits intomainfrom
more-general-tests
Oct 21, 2025
Merged

Generalizing tests for smarter LLMs#1149
jamesbraza merged 9 commits intomainfrom
more-general-tests

Conversation

@jamesbraza
Copy link
Copy Markdown
Collaborator

This PR adjusts unit tests to also apply for Claude 4.5 Sonnet

@jamesbraza jamesbraza self-assigned this Oct 20, 2025
@jamesbraza jamesbraza added the bug Something isn't working label Oct 20, 2025
Copilot AI review requested due to automatic review settings October 20, 2025 20:24
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. enhancement New feature or request labels Oct 20, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR generalizes unit tests to accommodate Claude 4.5 Sonnet in addition to existing GPT models. The changes make test assertions more flexible to handle different LLM response formats and improve test clarity through better naming and fixed typos.

Key Changes

  • Updated assertions to accept multiple valid response formats (e.g., "1.0mm", "1.0-mm", "1.0 mm")
  • Fixed typo "What is is" → "What is" in multiple test queries
  • Improved test data naming from generic "stub"/"special" to descriptive "positive"/"negative"

Reviewed Changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated no comments.

File Description
tests/test_paperqa.py Replaced exact string matching with regex pattern, improved test data naming from stub/special to positive/negative, clarified comments, added context assertion
tests/test_configs.py Extended LLM name check to accept both "gpt-" and "claude-" prefixes
tests/test_agents.py Fixed duplicate word typo in test queries, increased context substring check from 20 to 30 characters
tests/conftest.py Fixed duplicate word typo in test query fixture

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@dosubot
Copy link
Copy Markdown

dosubot bot commented Oct 20, 2025

Related Documentation

Checked 1 published document(s). No updates required.

How did I do? Any feedback?  Join Discord

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 21, 2025
@jamesbraza jamesbraza merged commit 8d26e24 into main Oct 21, 2025
7 checks passed
@jamesbraza jamesbraza deleted the more-general-tests branch October 21, 2025 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants