test: answer generation [COG-1234] #569

alekszievr · 2025-02-20T11:18:27Z

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

Tests
- Introduced a new asynchronous test to validate the answer generation functionality, ensuring that generated responses align with the provided question-answer pairs.

coderabbitai · 2025-02-20T11:18:35Z

Walkthrough

This pull request introduces a new unit test for the answer generation functionality within the evaluation framework. The test file sets up a dummy data adapter, utilizes an asynchronous mock answer resolver, and invokes the question_answering_non_parallel method from the executor to validate that the generated answers match the expected outcomes.

Changes

File(s)	Change Summary
cognee/tests/unit/eval_framework/answer_generation_test.py	Added a new asynchronous pytest test for the answer generation functionality within the evaluation framework.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test File
    participant Adapter as DummyAdapter
    participant Executor as AnswerGeneratorExecutor
    participant Resolver as Async AnswerResolver

    Test->>Adapter: Load corpus and question-answer pairs
    Test->>Resolver: Setup AsyncMock for answer resolution
    Test->>Executor: Invoke question_answering_non_parallel(questions, resolver)
    Executor->>Resolver: Process and resolve question
    Resolver-->>Executor: Return "mock_answer"
    Executor-->>Test: Return generated answers list

Suggested reviewers

borisarzentar

Poem

I'm a little rabbit with a codeberry treat,
Hopping through tests with nimble feet.
In async paths and flows so neat,
Answers bloom where logic and jest meet.
Debug carrots crunch, oh what a feat!
Celebrate each byte with a joyful heartbeat.
🥕🐰 Happy coding under the moonlit beat!

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af3a24b and b2d079f.

📒 Files selected for processing (1)

cognee/tests/unit/eval_framework/answer_generation_test.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

cognee/tests/unit/eval_framework/answer_generation_test.py

⏰ Context from checks skipped due to timeout of 90000ms (26)

GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_eval_framework_test / test
GitHub Check: run_networkx_metrics_test / test
GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: Test on macos-15
GitHub Check: Test on macos-15
GitHub Check: Test on macos-15
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: Test on macos-13
GitHub Check: test
GitHub Check: Test on macos-13
GitHub Check: run_simple_example_test / test
GitHub Check: run_multimedia_example_test / test
GitHub Check: Test on macos-13
GitHub Check: Test on ubuntu-22.04
GitHub Check: Test on ubuntu-22.04
GitHub Check: Test on ubuntu-22.04
GitHub Check: test
GitHub Check: test
GitHub Check: windows-latest
GitHub Check: docker-compose-test
GitHub Check: Build Cognee Backend Docker App Image

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

evals/eval_framework/benchmark_adapters/twowikimultihop_adapter.py (1)
24-26: 🛠️ Refactor suggestion

Add timeout to requests.get call.

The request to download the dataset should have a timeout to prevent hanging:
-            response = requests.get(self.dataset_info["URL"])
+            response = requests.get(self.dataset_info["URL"], timeout=30)
             response.raise_for_status()
             corpus_json = response.json()

🧹 Nitpick comments (4)

cognee/tests/unit/eval_framework/answer_generation_test.py (2)

16-19: Consider adding error handling for system preparation.

While the system preparation steps look good, consider adding error handling and assertions to verify that each step completed successfully.

-    await cognee.prune.prune_data()
-    await cognee.prune.prune_system(metadata=True)
-    await cognee.add(corpus_list)
-    await cognee.cognify()
+    try:
+        await cognee.prune.prune_data()
+        await cognee.prune.prune_system(metadata=True)
+        await cognee.add(corpus_list)
+        await cognee.cognify()
+    except Exception as e:
+        pytest.fail(f"System preparation failed: {str(e)}")

21-35: Consider adding more edge cases to the assertions.

The current assertions cover the basic functionality well. Consider adding more edge cases:

Test with empty questions list
Test with invalid answer resolver
Verify answer format/structure

+    # Test empty questions list
+    empty_answers = await answer_generator.question_answering_non_parallel(
+        questions=[],
+        answer_resolver=qa_engine,
+    )
+    assert len(empty_answers) == 0, "Empty questions list should return empty answers"
+
     answers = await answer_generator.question_answering_non_parallel(
         questions=qa_pairs,
         answer_resolver=qa_engine,
     )
     assert len(answers) == len(qa_pairs)
+    # Verify answer structure
+    assert isinstance(answers[0], dict), "Answer should be a dictionary"
+    assert all(key in answers[0] for key in ["question", "answer", "golden_answer"]), \
+        "Answer dictionary missing required keys"

cognee/tests/unit/eval_framework/corpus_builder_test.py (1)

8-16: Consider adding more test cases for corpus loading.

The test covers basic functionality but could be enhanced with more edge cases:

Test with limit=0
Test with negative limit
Test with limit larger than available data

@pytest.mark.parametrize("benchmark", benchmark_options)
-def test_corpus_builder_load_corpus(benchmark):
+@pytest.mark.parametrize("limit", [2, 0, -1, 1000])
+def test_corpus_builder_load_corpus(benchmark, limit):
-    limit = 2
     corpus_builder = CorpusBuilderExecutor(benchmark, "Default")
     raw_corpus, questions = corpus_builder.load_corpus(limit=limit)
     assert len(raw_corpus) > 0, f"Corpus builder loads empty corpus for {benchmark}"
-    assert len(questions) <= 2, (
+    expected_limit = max(0, limit) if limit != 1000 else float('inf')
+    assert len(questions) <= expected_limit, (
         f"Corpus builder loads {len(questions)} for {benchmark} when limit is {limit}"
     )

evals/eval_framework/benchmark_adapters/twowikimultihop_adapter.py (1)

31-33: Consider using numpy for random sampling.

For better performance with large datasets, consider using numpy's random sampling:

+import numpy as np
+
 if limit is not None and 0 < limit < len(corpus_json):
-    random.seed(seed)
-    corpus_json = random.sample(corpus_json, limit)
+    rng = np.random.default_rng(seed)
+    indices = rng.choice(len(corpus_json), size=limit, replace=False)
+    corpus_json = [corpus_json[i] for i in indices]

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2e0f47 and 63e132c.

📒 Files selected for processing (10)

.github/workflows/test_python_3_10.yml (1 hunks)
.github/workflows/test_python_3_11.yml (1 hunks)
.github/workflows/test_python_3_12.yml (1 hunks)
cognee/tests/unit/eval_framework/answer_generation_test.py (1 hunks)
cognee/tests/unit/eval_framework/benchmark_adapters_test.py (1 hunks)
cognee/tests/unit/eval_framework/corpus_builder_test.py (1 hunks)
evals/eval_framework/benchmark_adapters/dummy_adapter.py (1 hunks)
evals/eval_framework/benchmark_adapters/hotpot_qa_adapter.py (2 hunks)
evals/eval_framework/benchmark_adapters/musique_adapter.py (2 hunks)
evals/eval_framework/benchmark_adapters/twowikimultihop_adapter.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (30)

GitHub Check: run_notebook_test / test
GitHub Check: Test on macos-15
GitHub Check: Test on macos-15
GitHub Check: run_networkx_metrics_test / test
GitHub Check: run_simple_example_test / test
GitHub Check: Test on macos-13
GitHub Check: Test on macos-13
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: run_notebook_test / test
GitHub Check: Test on macos-15
GitHub Check: run_eval_framework_test / test
GitHub Check: Test on macos-13
GitHub Check: test
GitHub Check: Test on ubuntu-22.04
GitHub Check: Test on ubuntu-22.04
GitHub Check: test
GitHub Check: windows-latest
GitHub Check: test
GitHub Check: test
GitHub Check: run_multimedia_example_test / test
GitHub Check: Test on ubuntu-22.04
GitHub Check: run_dynamic_steps_example_test / test
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: lint (ubuntu-latest, 3.11.x)
GitHub Check: lint (ubuntu-latest, 3.10.x)
GitHub Check: docker-compose-test
GitHub Check: Build Cognee Backend Docker App Image

🔇 Additional comments (12)

evals/eval_framework/benchmark_adapters/dummy_adapter.py (1)

9-9: LGTM! Type hint simplification looks good.

The simplified return type tuple[list[str], list[dict[str, str]]] is more accurate and cleaner than the previous version with Union[LiteralString, str], as the method only returns regular strings.

cognee/tests/unit/eval_framework/answer_generation_test.py (1)

10-14: LGTM! Test setup looks good.

Good use of pytest's parametrization to test multiple QA engines. The limit of 1 is appropriate for unit testing as it keeps the test focused and fast.

evals/eval_framework/benchmark_adapters/twowikimultihop_adapter.py (1)

5-5: LGTM! Type hint simplification looks good.

The simplified imports and return type tuple[list[str], list[dict[str, Any]]] are more accurate and cleaner than the previous version with Union[LiteralString, str].

Also applies to: 17-17

evals/eval_framework/benchmark_adapters/hotpot_qa_adapter.py (1)

5-5: LGTM! Type hints simplified correctly.

The simplified type hints better reflect the actual types used in the code. The corpus_list only contains strings, making list[str] more accurate than the previous union type.

Also applies to: 19-19

cognee/tests/unit/eval_framework/benchmark_adapters_test.py (1)

66-69: LGTM! Test assertion correctly focuses on QA pairs limit.

The test now properly focuses on validating the limit for QA pairs, which is the primary concern. This change acknowledges that different adapters may handle corpus entries differently, making the previous corpus list length assertion too restrictive.
evals/eval_framework/benchmark_adapters/musique_adapter.py (2)

4-4: LGTM! Type hints simplified correctly.

The simplified type hints better reflect the actual types used in the code.

67-68:
Details

❓ Verification inconclusive

Verify impact of paragraph handling change.

The change from concatenating paragraphs to appending them individually improves granularity but might affect downstream processing that expects concatenated paragraphs.

Run this script to check for potential impacts:

🏁 Script executed:
#!/bin/bash
# Search for code that processes the corpus entries
rg -A 5 "corpus_list" --glob '!**/*test*.py'
Length of output: 5338
Review: Verify Downstream Impact of Modified Paragraph Handling in musique_adapter.py

The change now appends each individual paragraph["paragraph_text"] to corpus_list rather than concatenating multiple texts. This increased granularity may alter the behavior expected by downstream processing, especially since similar adapters (e.g., twowikimultihop_adapter.py and hotpot_qa_adapter.py) join sentence fragments into a single string.

Action items:

Validate downstream consumers: Confirm that any functions or modules processing corpus_list in musique_adapter.py can handle multiple discrete entries instead of a combined text.

Ensure consistency: Reassess if the differing handling across adapters is intentional. If a uniform output format is desired across benchmarks, consider aligning the logic.

Review integration tests: Verify that tests covering the musique_adapter structure and its downstream usage remain valid with this change.
.github/workflows/test_python_3_12.yml (1)

51-51: LGTM! Dependencies and environment variables properly configured.

The changes correctly:

Install evaluation framework dependencies via the evals extra

Provide the necessary LLM_API_KEY for unit tests

Also applies to: 57-60

.github/workflows/test_python_3_10.yml (2)

50-50: Enhanced Dependency Installation Command.
The updated command now includes the additional evals extra alongside docs, ensuring that all required dependencies for evaluation features are installed. This update promotes consistency with the other workflow files.

57-59: Updated Environment Variables in Unit Test Step.
The "Run unit tests" step now explicitly sets ENV: 'dev' and introduces LLM_API_KEY from GitHub secrets. This ensures that the unit tests have access to the necessary runtime configurations and credentials.

.github/workflows/test_python_3_11.yml (2)

51-51: Refined Dependency Installation Command.
The installation step now uses poetry install --no-interaction -E docs -E evals, which adds the evals extra to the dependency management. This adjustment aligns with other Python version workflows and maintains consistency.

59-61: Consistent Environment Configuration for Unit Tests.
The "Run unit tests" step has been updated to include the environment variables ENV: 'dev' and LLM_API_KEY (sourced from GitHub secrets). This change ensures that the unit tests operate with the same configuration across different Python versions.

cognee/tests/unit/eval_framework/corpus_builder_test.py

…est-answer-generation

cognee/tests/unit/eval_framework/answer_generation_test.py

…rated answer correctly

cognee/tests/unit/eval_framework/answer_generation_test.py

alekszievr and others added 11 commits February 19, 2025 15:35

Parse context pieces separately in MusiqueQAAdapter and adjust tests

ef83afc

Add evals extras to pytests

aa62be6

Remove LiteralString typehints

0bfeba3

Merge branch 'dev' into test/cog-1234-test-eval-framework

f7a2000

Corpus builder test

c54b6cb

add secrets to unit test workflow files

6cabc79

add secrets to unit test workflow files

f119f47

add secrets to unit test workflow files

397a287

add secrets to unit test workflow files

747c62a

add secrets to unit test workflow files

8e3a9b9

Test AnswerGeneratorExecutor

63e132c

alekszievr self-assigned this Feb 20, 2025

alekszievr added the run-checks label Feb 20, 2025

coderabbitai bot reviewed Feb 20, 2025

View reviewed changes

cognee/tests/unit/eval_framework/corpus_builder_test.py Outdated Show resolved Hide resolved

alekszievr and others added 2 commits February 20, 2025 13:06

Merge branch 'dev' into test/cog-1234-test-answer-generation

24fb35d

Test answering engines except for code

f409bf2

alekszievr changed the base branch from dev to test/cog-1234-test-eval-framework February 20, 2025 12:56

Base automatically changed from test/cog-1234-test-eval-framework to dev February 20, 2025 13:23

alekszievr changed the base branch from dev to test/cog-1234-test-corpus-builder February 20, 2025 13:24

alekszievr and others added 2 commits February 20, 2025 14:24

Merge branch 'dev' into test/cog-1234-test-corpus-builder

0edd4ec

mock cognee in corpus builder tests

29f69f3

alekszievr changed the base branch from test/cog-1234-test-corpus-builder to dev February 20, 2025 13:57

alekszievr and others added 7 commits February 20, 2025 14:57

Merge branch 'dev' into test/cog-1234-test-answer-generation

5b4da74

Merge branch 'test/cog-1234-test-corpus-builder' into test/cog-1234-t…

6a0b7f5

…est-answer-generation

Mock cognee in answer generation test

7baa8c5

Merge branch 'dev' into test/cog-1234-test-answer-generation

918d9ad

remove unused variable

3cebe36

Merge branch 'dev' into test/cog-1234-test-answer-generation

43241c2

Merge branch 'dev' into test/cog-1234-test-answer-generation

2d5fc45

borisarzentar reviewed Feb 21, 2025

View reviewed changes

cognee/tests/unit/eval_framework/answer_generation_test.py Show resolved Hide resolved

alekszievr and others added 2 commits February 24, 2025 15:11

Merge branch 'dev' into test/cog-1234-test-answer-generation

f5028c6

Test if AnswerGeneratorExecutor calls answer_resolver and passes gene…

af3a24b

…rated answer correctly

borisarzentar reviewed Feb 24, 2025

View reviewed changes

cognee/tests/unit/eval_framework/answer_generation_test.py Outdated Show resolved Hide resolved

cognee/tests/unit/eval_framework/answer_generation_test.py Show resolved Hide resolved

alekszievr and others added 2 commits February 25, 2025 09:34

Merge branch 'dev' into test/cog-1234-test-answer-generation

172bf52

Remove redundant test

b2d079f

borisarzentar approved these changes Feb 25, 2025

View reviewed changes

borisarzentar changed the title ~~Test: Test answer generation [COG-1234]~~ test: answer generation [COG-1234] Feb 25, 2025

alekszievr merged commit a788875 into dev Feb 25, 2025
36 checks passed

alekszievr deleted the test/cog-1234-test-answer-generation branch February 25, 2025 11:21

coderabbitai bot mentioned this pull request Mar 11, 2025

feat: regular cognee eval to segment [COG-1414] #632

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: answer generation [COG-1234] #569

test: answer generation [COG-1234] #569

Uh oh!

alekszievr commented Feb 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 20, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

test: answer generation [COG-1234] #569

test: answer generation [COG-1234] #569

Uh oh!

Conversation

alekszievr commented Feb 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

DCO Affirmation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alekszievr commented Feb 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 20, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)