-
Notifications
You must be signed in to change notification settings - Fork 285
Add RAG modes and strengthen strict mode #2180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add RAG modes and strengthen strict mode #2180
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideImplements configurable RAG operation modes (strict, hybrid, augment) in the rag_framework script and strengthens the strict mode prompt to rely only on retrieved document content while preserving existing augment behavior as the default. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary of ChangesHello @csoriano2718, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the RAG (Retrieval Augmented Generation) framework by introducing configurable operational modes. These modes allow users to precisely control the balance between relying solely on retrieved documents and leveraging the AI's general knowledge, addressing previous limitations where users had to choose between document-only or general knowledge responses. The update also fortifies the "strict" mode to prevent AI hallucinations and ensure responses are strictly grounded in provided data. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces configurable RAG modes (strict, hybrid, augment), which is a valuable enhancement for controlling the RAG system's behavior. The implementation correctly uses the RAG_MODE environment variable to select different system prompts, and the new strict mode prompt is well-crafted to prevent responses outside the provided context.
My main feedback is on code structure. The current implementation for selecting prompts has a fair amount of code duplication. I've provided a suggestion to refactor this using a data-driven approach, which should improve maintainability and make it easier to add more modes in the future.
Additionally, I recommend adding tests for the new RAG modes to verify their behavior and prevent regressions.
747e690 to
e89a120
Compare
|
I really like the idea! I have one comment in the near future, this rag pipeline can be added as an MCP server tool where the model can decide if it needs to use it or not! So we wouldn't need the strict vs augment functionality as we will follow an agentic workflow at that point But until then, this should work! |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable feature for controlling RAG behavior through strict and augment modes. The implementation in rag_framework is clear and effectively uses different system prompts for each mode. The addition of E2E tests is also a great step towards ensuring the reliability of this feature. My review focuses on improving the thoroughness and correctness of these new tests. I've identified a gap in the augment mode test and a flawed implementation in the environment variable propagation test. Addressing these points will make the test suite more robust and reliable.
|
@bmahabirbu ah that's a great idea, doing RAG as an MCP server. I wonder, should we not implement this PR to avoid exposing new features/APIs in Ramalama that in the future Ramalama plans to recommend using MCP servers for? Or what's the approach of breaking past functionality to allow doing better in the future? I didn't intend to implement something that will slow down Ramalama development. |
bf26592 to
36e2a49
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively introduces RAG modes (strict and augment) to control document retrieval versus general AI knowledge, along with corresponding end-to-end tests. The implementation is straightforward, using environment variables to switch between system prompts.
My review focuses on enhancing the robustness and maintainability of the new code. I've identified a potential issue where an invalid RAG_MODE could silently fall back to the default, which might have security implications. Additionally, I've found a weakness in the test logic for strict mode that could allow incorrect behavior to pass, and I've suggested a way to improve the test suite's maintainability by reducing code duplication. Overall, these are solid changes that will be even better with a few adjustments.
36e2a49 to
d593ad3
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively introduces RAG modes (strict and augment) to control how the RAG proxy responds to queries, which is a great feature for providing more control to the user. The implementation is clear, using an environment variable to switch between system prompts. The addition of end-to-end tests for both modes is excellent and ensures the new functionality works as expected. My review includes a few suggestions to enhance maintainability by using constants for validation and refactoring large string prompts, and to improve the precision of the test assertions for the new strict mode. Overall, this is a solid contribution.
4f5e269 to
cb13d44
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a RAG_MODE environment variable to switch between strict (document-only) and augment (documents + general knowledge) modes for the RAG proxy. The implementation is clean, using distinct system prompts for each mode, and is accompanied by a comprehensive set of new E2E tests that validate the behavior of both modes. The changes are well-executed and add valuable control over the RAG's output. My review includes a couple of suggestions in the new test file to enhance robustness by using more specific exception handling.
2adf54f to
91c1777
Compare
|
ok, I think Gemini and Cursor have reach an agreement now :-) |
|
If we are going to add this feature it needs to be documented in a man page. |
|
Might also want to have a setting in ramalama.conf |
The original RAG prompt used weak language that LLMs often ignored, leading to hallucinations when documents didn't contain the answer. This strengthens the prompt to strictly enforce document-only responses. Changes: - Replace weak prompt language with explicit CRITICAL RULES - Require exact response "I don't know" when answer not in context - Explicitly forbid use of general knowledge or training data This ensures RAG behaves as a true document retrieval system rather than an AI that happens to have access to documents. Tested with deepseek-r1:14b - correctly refuses to answer general knowledge questions not in the indexed documents. Assisted-by: Cursor with Claude Sonnet 4.5 Signed-off-by: Carlos Soriano <[email protected]>
2963e8c to
3d971ce
Compare
Introduces RAG_MODE to control how the RAG proxy balances document retrieval with the model's general knowledge: - strict: Only answer from retrieved documents. Refuses questions not covered by indexed documents. Use for compliance, legal, or privacy-sensitive scenarios. - augment: (Default) Freely combines documents with general AI knowledge. Documents used when relevant, general knowledge supplements when helpful. Implementation: - Add RAG_MODE constants and validation in rag_framework - Update system prompts based on mode selection - Pass RAG_MODE from config to container via cli.py Documentation: - Add RAG_MODE section to ramalama-serve.1.md man page - Add rag_mode option to ramalama.conf.5.md man page - Add rag_mode example to sample ramalama.conf - Add rag_mode field to config.py BaseConfig Usage: ramalama serve --env RAG_MODE=strict --rag /path/to/db model Or in ~/.config/ramalama/ramalama.conf: [ramalama] rag_mode = "strict" Signed-off-by: Carlos Soriano <[email protected]>
3d971ce to
b0c0827
Compare
|
Failing Lint and you should squash and sign your commits. |
This PR adds RAG mode control via the
RAG_MODEenvironment variable, giving users clear control over how the RAG proxy balances document retrieval with general AI knowledge.RAG Modes
Two operational modes are provided:
strict: Document-only responses, refuses general knowledge queries
augment (default): Freely combines documents with general AI knowledge
Usage
Implementation
augmentifRAG_MODEis not setTesting
E2E tests included for both modes with positive/negative test cases:
Tests are designed for models ≥7B parameters (e.g., deepseek-r1:14b, mistral:7b) which provide reliable retrieval and extraction.
Container Changes Required
This PR requires the RAG container to include the updated
rag_frameworkscript. The container image needs to be rebuilt with the changes from this branch.