-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feature: adds triplet embedding via memify #1832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
001e539
feat: adds triplet datapoint
hajdul88 1e3c304
feat: adds get triplet batches adapter level method
hajdul88 e6b6e82
feat: adds initial get_triplet_datapoints
hajdul88 0b393a7
feat: adds first version of get_triplet_datapoints task
hajdul88 a512e1d
ruff format
hajdul88 01921d6
Update get_triplet_datapoints.py
hajdul88 df88fac
Update get_triplet_datapoints.py
hajdul88 de4aa73
Merge branch 'dev' into feature/cog-3326-2triplet-embedding-via-memify
hajdul88 855eb1f
feat: adds logging to get_triplet_datapoints
hajdul88 acc370c
ruff
hajdul88 64f81a5
removes indexing from get_triplet datapoints
hajdul88 207fe58
feat: introduces memify wrapping
hajdul88 717cc69
fix: fixes batching and pipeline yielding logic
hajdul88 2472174
ruff format
hajdul88 72f4533
fix: fixing user issue with new memify pipeline
hajdul88 b7cd326
uc and poetry fix for lancedb
hajdul88 09ad8ea
Revert "uc and poetry fix for lancedb"
hajdul88 565ed40
fix: fixes logging
hajdul88 36162f5
fix: fixes embedded text by adding separators
hajdul88 8cc7530
feat: adds triplet completion to search types
hajdul88 16e9e76
feat: adds triplet retriever and connects it to search type tools
hajdul88 2a447e0
ruff fix
hajdul88 232c761
fix: lancedb fix in order to be able to run CI (TO REVERT)
hajdul88 77dd332
fix: fixes no triplet embedding error (thats why you should never cop…
hajdul88 1ecbcff
feat: adds memify triplet embedding example
hajdul88 5e8b53f
feat: adds get_triplet_datapoints unit test
hajdul88 dcf43be
feat: adds integration test for get_triplet_datapoints
hajdul88 4d062df
ruff format
hajdul88 c00814b
Update test_get_triplet_datapoints.py
hajdul88 df7018e
Update test_get_triplet_datapoints.py
hajdul88 2335f78
feat: adds unit and integration test to triplet retriever
hajdul88 ac57ff2
feat: adds triplet embedding example
hajdul88 36cdd2e
ruff format
hajdul88 05db29b
feat: extends session history test with new triplet completion retriever
hajdul88 7d7133d
ruff
hajdul88 f32578f
feat: extends multidb search test with triplet embedding test
hajdul88 6b7c483
chore: deletes comments
hajdul88 2380ebb
chores: deletes some comments
hajdul88 2a0972a
chore: deletes some comments from conv history
hajdul88 29575f9
chore: deletes comments from search db test
hajdul88 ddebafe
Merge branch 'dev' into feature/cog-3326-2triplet-embedding-via-memify
hajdul88 c988fab
chore: fixes coderabbit findings (unused imports)
hajdul88 fb11dd0
Revert "fix: lancedb fix in order to be able to run CI (TO REVERT)"
hajdul88 3cb8305
feat: separates session vs non session logic into private methods
hajdul88 608bdc3
chore: removes if (not needed)
hajdul88 463fc30
chore: moves continue a bit earlier in the loop
hajdul88 41488e9
ruff ruff
hajdul88 d96d536
chore: moving if earlier
hajdul88 3293ab4
chore: remove unused import
hajdul88 b287701
Revert "chore: remove unused import"
hajdul88 7ab00b5
Merge branch 'dev' into feature/cog-3326-2triplet-embedding-via-memify
hajdul88 5424ebc
chore: breaks the get_triplet_datapoints logic into chain of private …
hajdul88 6363bda
ruff format
hajdul88 61f7a2e
Merge branch 'feature/cog-3326-2triplet-embedding-via-memify' of gith…
hajdul88 804289b
feat: adds vector+graph consistency check for triplet embedding
hajdul88 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
feat: adds unit and integration test to triplet retriever
- Loading branch information
commit 2335f7886acbf7478a05b5bbbac4df4ad4c77e36
There are no files selected for viewing
84 changes: 84 additions & 0 deletions
84
cognee/tests/integration/retrieval/test_triplet_retriever.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| import os | ||
| import pytest | ||
| import pathlib | ||
| import pytest_asyncio | ||
| import cognee | ||
|
|
||
| from cognee.low_level import setup | ||
| from cognee.tasks.storage import add_data_points | ||
| from cognee.modules.retrieval.exceptions.exceptions import NoDataError | ||
| from cognee.modules.retrieval.triplet_retriever import TripletRetriever | ||
| from cognee.modules.engine.models import Triplet | ||
|
|
||
|
|
||
| @pytest_asyncio.fixture | ||
| async def setup_test_environment_with_triplets(): | ||
| """Set up a clean test environment with triplets.""" | ||
| base_dir = pathlib.Path(__file__).parent.parent.parent.parent | ||
| system_directory_path = str(base_dir / ".cognee_system/test_triplet_retriever_context_simple") | ||
| data_directory_path = str(base_dir / ".data_storage/test_triplet_retriever_context_simple") | ||
|
|
||
| cognee.config.system_root_directory(system_directory_path) | ||
| cognee.config.data_root_directory(data_directory_path) | ||
|
|
||
| await cognee.prune.prune_data() | ||
| await cognee.prune.prune_system(metadata=True) | ||
| await setup() | ||
|
|
||
| triplet1 = Triplet( | ||
| from_node_id="node1", | ||
| to_node_id="node2", | ||
| text="Alice knows Bob", | ||
| ) | ||
| triplet2 = Triplet( | ||
| from_node_id="node2", | ||
| to_node_id="node3", | ||
| text="Bob works at Tech Corp", | ||
| ) | ||
|
|
||
| triplets = [triplet1, triplet2] | ||
| await add_data_points(triplets) | ||
|
|
||
| yield | ||
|
|
||
| try: | ||
| await cognee.prune.prune_data() | ||
| await cognee.prune.prune_system(metadata=True) | ||
| except Exception: | ||
| pass | ||
|
|
||
|
|
||
| @pytest_asyncio.fixture | ||
| async def setup_test_environment_empty(): | ||
| """Set up a clean test environment without triplets.""" | ||
| base_dir = pathlib.Path(__file__).parent.parent.parent.parent | ||
| system_directory_path = str( | ||
| base_dir / ".cognee_system/test_triplet_retriever_context_empty_collection" | ||
| ) | ||
| data_directory_path = str( | ||
| base_dir / ".data_storage/test_triplet_retriever_context_empty_collection" | ||
| ) | ||
|
|
||
| cognee.config.system_root_directory(system_directory_path) | ||
| cognee.config.data_root_directory(data_directory_path) | ||
|
|
||
| await cognee.prune.prune_data() | ||
| await cognee.prune.prune_system(metadata=True) | ||
|
|
||
| yield | ||
|
|
||
| try: | ||
| await cognee.prune.prune_data() | ||
| await cognee.prune.prune_system(metadata=True) | ||
| except Exception: | ||
| pass | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_triplet_retriever_context_simple(setup_test_environment_with_triplets): | ||
| """Integration test: verify TripletRetriever can retrieve triplet context.""" | ||
| retriever = TripletRetriever(top_k=5) | ||
|
|
||
| context = await retriever.get_context("Alice") | ||
|
|
||
| assert "Alice knows Bob" in context, "Failed to get Alice triplet" |
83 changes: 83 additions & 0 deletions
83
cognee/tests/unit/modules/retrieval/triplet_retriever_test.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| import pytest | ||
| from unittest.mock import AsyncMock, patch, MagicMock | ||
|
|
||
| from cognee.modules.retrieval.triplet_retriever import TripletRetriever | ||
| from cognee.modules.retrieval.exceptions.exceptions import NoDataError | ||
| from cognee.infrastructure.databases.vector.exceptions import CollectionNotFoundError | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def mock_vector_engine(): | ||
| """Create a mock vector engine.""" | ||
| engine = AsyncMock() | ||
| engine.has_collection = AsyncMock(return_value=True) | ||
| engine.search = AsyncMock() | ||
| return engine | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_get_context_success(mock_vector_engine): | ||
| """Test successful retrieval of triplet context.""" | ||
| mock_result1 = MagicMock() | ||
| mock_result1.payload = {"text": "Alice knows Bob"} | ||
| mock_result2 = MagicMock() | ||
| mock_result2.payload = {"text": "Bob works at Tech Corp"} | ||
|
|
||
| mock_vector_engine.search.return_value = [mock_result1, mock_result2] | ||
|
|
||
| retriever = TripletRetriever(top_k=5) | ||
|
|
||
| with patch( | ||
| "cognee.modules.retrieval.triplet_retriever.get_vector_engine", | ||
| return_value=mock_vector_engine, | ||
| ): | ||
| context = await retriever.get_context("test query") | ||
|
|
||
| assert context == "Alice knows Bob\nBob works at Tech Corp" | ||
| mock_vector_engine.search.assert_awaited_once_with("Triplet_text", "test query", limit=5) | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_get_context_no_collection(mock_vector_engine): | ||
| """Test that NoDataError is raised when Triplet_text collection doesn't exist.""" | ||
| mock_vector_engine.has_collection.return_value = False | ||
|
|
||
| retriever = TripletRetriever() | ||
|
|
||
| with patch( | ||
| "cognee.modules.retrieval.triplet_retriever.get_vector_engine", | ||
| return_value=mock_vector_engine, | ||
| ): | ||
| with pytest.raises(NoDataError, match="create_triplet_embeddings"): | ||
| await retriever.get_context("test query") | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_get_context_empty_results(mock_vector_engine): | ||
| """Test that empty string is returned when no triplets are found.""" | ||
| mock_vector_engine.search.return_value = [] | ||
|
|
||
| retriever = TripletRetriever() | ||
|
|
||
| with patch( | ||
| "cognee.modules.retrieval.triplet_retriever.get_vector_engine", | ||
| return_value=mock_vector_engine, | ||
| ): | ||
| context = await retriever.get_context("test query") | ||
|
|
||
| assert context == "" | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_get_context_collection_not_found_error(mock_vector_engine): | ||
| """Test that CollectionNotFoundError is converted to NoDataError.""" | ||
| mock_vector_engine.has_collection.side_effect = CollectionNotFoundError("Collection not found") | ||
|
|
||
| retriever = TripletRetriever() | ||
|
|
||
| with patch( | ||
| "cognee.modules.retrieval.triplet_retriever.get_vector_engine", | ||
| return_value=mock_vector_engine, | ||
| ): | ||
| with pytest.raises(NoDataError, match="No data found"): | ||
| await retriever.get_context("test query") | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.