Skip to content

Conversation

@alekszievr
Copy link
Contributor

@alekszievr alekszievr commented Feb 27, 2025

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

  • New Features

    • Introduced a class-based retrieval mechanism to enhance answer generation with improved context extraction and completion.
    • Added a new evaluation metric for contextual relevancy and an option to enable context evaluation during the evaluation process.
  • Refactor

    • Transitioned from a function-based answer resolver to a more modular retriever approach to improve extensibility.
  • Tests

    • Updated tests to align with the new answer generation and evaluation process.

lxobr and others added 30 commits February 21, 2025 09:39
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
# Conflicts:
#	cognee/modules/retrieval/utils/code_graph_retrieval.py
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
evals/eval_framework/answer_generation/answer_generation_executor.py (1)

2-9: Add Type import for better type annotations.

The imports look good, but to properly annotate class types, you should add Type from the typing module which will be needed for accurate type hints.

-from typing import List, Dict, Callable, Awaitable
+from typing import List, Dict, Callable, Awaitable, Type
cognee/tests/unit/eval_framework/deepeval_adapter_test.py (3)

18-37: Consider adding context-specific assertions.

The test validates that EM and f1 metrics are returned but doesn't explicitly verify that the context is correctly utilized in the evaluation. Consider adding assertions that specifically test how different contexts affect the evaluation results.


39-91: Enhance test consistency with retrieval_context.

For consistency, consider adding the retrieval_context field to all test cases, particularly in test_unsupported_metric and test_missing_fields_in_answer. This would ensure comprehensive test coverage for the new field across all scenarios.


1-11: Consider adding a test for ContextualRelevancyMetric.

The AI summary mentions the introduction of a ContextualRelevancyMetric in the DeepEvalAdapter. Consider adding specific tests for this new metric to validate its behavior with different contexts.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e2beb3 and 3f10725.

📒 Files selected for processing (2)
  • cognee/tests/unit/eval_framework/deepeval_adapter_test.py (2 hunks)
  • evals/eval_framework/answer_generation/answer_generation_executor.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (29)
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on macos-13
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: Test on macos-13
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Build Cognee Backend Docker App Image
  • GitHub Check: docker-compose-test
🔇 Additional comments (6)
evals/eval_framework/answer_generation/answer_generation_executor.py (4)

12-19: Fix type annotations for retriever classes.

The current type annotation indicates instances of BaseRetriever, but you're storing class references rather than instances. This should be updated for clarity and correctness.

-retriever_options: Dict[str, BaseRetriever] = {
+retriever_options: Dict[str, Type[BaseRetriever]] = {

26-26: Revisit the type hint for retriever_cls.

The parameter is typed as BaseRetriever, but you're passing class constructors rather than instances as shown by the instantiation at line 28.

-    retriever_cls: BaseRetriever,
+    retriever_cls: Type[BaseRetriever],

34-35: Good implementation of the context-aware retrieval process.

The two-step retrieval process (getting context first, then obtaining completion with that context) is a clean approach that properly separates concerns and allows for better evaluation of context quality.


42-42: Great addition of retrieval context to results.

Including the retrieval context in the output allows for better analysis of how different contexts affect answer quality, which is essential for context evaluation.

cognee/tests/unit/eval_framework/deepeval_adapter_test.py (2)

25-25: Good addition of retrieval context for testing.

This change properly integrates context evaluation capabilities into the testing framework, aligning with the PR objective. The provided context "2 + 2 = 4" is relevant to the question and answer being evaluated.


81-81: Good test coverage for None values.

Extending the None value testing to include the new retrieval_context field ensures proper handling of null contexts in the adapter.

@lxobr lxobr self-requested a review March 3, 2025 11:54
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f10725 and d9b007a.

📒 Files selected for processing (2)
  • evals/eval_framework/eval_config.py (1 hunks)
  • evals/eval_framework/modal_run_eval.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • evals/eval_framework/modal_run_eval.py
⏰ Context from checks skipped due to timeout of 90000ms (23)
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: Test on macos-13
  • GitHub Check: test
  • GitHub Check: Test on macos-13
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: windows-latest
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: run_simple_example_test
  • GitHub Check: docker-compose-test

lxobr
lxobr previously approved these changes Mar 3, 2025
@borisarzentar borisarzentar changed the title Add context evaluation to eval framework [COG-1366] feat: Add context evaluation to eval framework [COG-1366] Mar 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
cognee/eval_framework/evaluation/evaluation_executor.py (1)

6-10: Constructor signature updated to accept context evaluation parameter.

The __init__ method now accepts an evaluate_contexts parameter, with a default value of False.

Note that this default differs from the True default in the EvalConfig class. While this shouldn't cause issues since the parameter is explicitly passed from configuration, consider aligning these defaults to avoid potential confusion in future maintenance.

- def __init__(
-     self,
-     evaluator_engine: Union[str, EvaluatorAdapter, Any] = "DeepEval",
-     evaluate_contexts: bool = False,
- ) -> None:
+ def __init__(
+     self,
+     evaluator_engine: Union[str, EvaluatorAdapter, Any] = "DeepEval",
+     evaluate_contexts: bool = True,
+ ) -> None:
cognee/eval_framework/evaluation/deep_eval_adapter.py (1)

34-34: Prevent potential KeyError for answer["retrieval_context"].

Currently, retrieval_context=[answer["retrieval_context"]] could raise a KeyError if retrieval_context is missing from the answer dictionary. Consider using .get("retrieval_context", default_value) or adding error handling.

- retrieval_context=[answer["retrieval_context"]]
+ retrieval_context=[answer.get("retrieval_context", None)]
cognee/eval_framework/answer_generation/answer_generation_executor.py (2)

3-9: Explicitly annotate imports as classes if needed.

Here you import retriever classes, but the type annotation for retriever_options is Dict[str, BaseRetriever], implying instances, not classes. Adjust to Dict[str, Type[BaseRetriever]] or instantiate them here if this is intentional.

- retriever_options: Dict[str, BaseRetriever] = {
+ from typing import Type
+ retriever_options: Dict[str, Type[BaseRetriever]] = {

26-28: Clarify the type of retriever_cls.

Since retriever_cls represents a constructor, consider updating its type annotation to Type[BaseRetriever]. Otherwise, the current signature may cause confusion, as it suggests an instance rather than a class.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3453ede and 5b9a64d.

📒 Files selected for processing (7)
  • cognee/eval_framework/answer_generation/answer_generation_executor.py (1 hunks)
  • cognee/eval_framework/answer_generation/run_question_answering_module.py (2 hunks)
  • cognee/eval_framework/eval_config.py (2 hunks)
  • cognee/eval_framework/evaluation/deep_eval_adapter.py (3 hunks)
  • cognee/eval_framework/evaluation/evaluation_executor.py (2 hunks)
  • cognee/eval_framework/evaluation/run_evaluation_module.py (1 hunks)
  • cognee/tests/unit/eval_framework/answer_generation_test.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cognee/tests/unit/eval_framework/answer_generation_test.py
⏰ Context from checks skipped due to timeout of 90000ms (31)
  • GitHub Check: Test on macos-15
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-13
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: test
  • GitHub Check: lint (ubuntu-latest, 3.10.x)
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Build Cognee Backend Docker App Image
  • GitHub Check: docker-compose-test
  • GitHub Check: run_simple_example_test
🔇 Additional comments (11)
cognee/eval_framework/eval_config.py (2)

21-21: Added context evaluation configuration flag.

The new evaluating_contexts boolean flag is appropriately added to the EvalConfig class, defaulting to True. This enables context evaluation in the evaluation framework by default.


55-55: Updated to_dict method to include the new configuration parameter.

The to_dict method is properly updated to include the new evaluating_contexts parameter, maintaining consistency with the serialization of other configuration options. The comment provides helpful context about the parameter's purpose.

cognee/eval_framework/answer_generation/run_question_answering_module.py (2)

6-6: Import changed from function-based to class-based retriever approach.

The change from question_answering_engine_options to retriever_options reflects the architectural shift from function-based answer resolvers to class-based retrievers, which is consistent with modern design patterns.


51-52: Updated to use retriever class instead of answer resolver function.

The implementation now uses the retriever_cls parameter with the appropriate class from retriever_options based on the configured engine. This aligns with the architectural changes for using class-based retrievers.

cognee/eval_framework/evaluation/run_evaluation_module.py (1)

45-48: Added context evaluation parameter to EvaluationExecutor.

The EvaluationExecutor instantiation now correctly passes the evaluating_contexts configuration parameter, enabling the evaluation of contexts when configured.

cognee/eval_framework/evaluation/evaluation_executor.py (2)

21-21: Stored context evaluation setting as instance variable.

The constructor appropriately stores the evaluate_contexts parameter as an instance variable for later use.


24-25: Added conditional logic to include contextual relevancy metric.

The implementation now conditionally appends the "contextual_relevancy" metric to the list of evaluator metrics when context evaluation is enabled. This ensures that the evaluator will perform context evaluation only when configured to do so.

cognee/eval_framework/evaluation/deep_eval_adapter.py (2)

8-8: Ensure dependency compatibility for the newly introduced metric.

You are importing ContextualRelevancyMetric from deepeval.metrics. Verify that the library version supports this metric and is pinned in the project’s dependencies to avoid potential mismatched or missing attributes at runtime.

Would you like to run a script to confirm the availability of ContextualRelevancyMetric in your current environment?


17-17: New “contextual_relevancy” metric looks good.

Adding a new metric to g_eval_metrics is straightforward and maintains consistency with the other metrics. This cleanly extends the adapter for contextual evaluations.

cognee/eval_framework/answer_generation/answer_generation_executor.py (2)

12-19: Enhance reusability of retriever_options.

This dictionary neatly centralizes retriever classes. Ensure all retrievers share consistent interfaces and that their constructor parameters remain compatible if modifications are made in the future.


34-35: Capture retrieval context carefully.

Storing the retrieval context is beneficial for debugging and evaluation. However, ensure you handle any exceptions from get_context or get_completion gracefully, as network or data errors could cause the method to fail.

Would you like to add a try-except block to handle potential retriever errors and ensure graceful fallback?

Also applies to: 42-42

@alekszievr alekszievr requested review from hajdul88 and lxobr March 5, 2025 11:21
Copy link
Collaborator

@hajdul88 hajdul88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general we shouldn't change how we generate answers etc. So the pipeline shouldn't change from structural point of view. I believe the context can be simply added to the json that we are passing through the modules.

@alekszievr alekszievr requested a review from hajdul88 March 5, 2025 14:25
Copy link
Collaborator

@hajdul88 hajdul88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it with dreamify it works! Also we talked about the changes so I think its good now. The dashboard will need some changes in the next PR. Good job!

@alekszievr alekszievr merged commit 433264d into dev Mar 5, 2025
37 checks passed
@alekszievr alekszievr deleted the feat/cog-1366-add-context-evaluation branch March 5, 2025 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants