-
Notifications
You must be signed in to change notification settings - Fork 958
Add type to DataPoint metadata #364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces several changes across multiple database adapter classes, enhancing the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (17)
cognee/modules/data/processing/document_types/Document.py (1)
15-16: Implement or document the unimplemented read method.The
readmethod is currently a placeholder. Consider either implementing it or adding a docstring explaining why it's left unimplemented.Would you like me to help implement this method or create an issue to track this task?
cognee/shared/CodeGraphEntities.py (1)
2-2: Remove unnecessary empty lines.There are several unnecessary empty lines that could be removed to improve code organization.
- - from cognee.infrastructure.engine import DataPoint - - class Repository(DataPoint):Also applies to: 5-5, 32-32
cognee/infrastructure/engine/models/DataPoint.py (1)
45-45: Consider adding null safety checkThe simplified return statement assumes _metadata is always present. While this is set in the class definition, derived classes might override it.
- return data_point._metadata["index_fields"] or [] + return data_point._metadata.get("index_fields", []) if data_point._metadata else []cognee/shared/SourceCodeGraph.py (1)
1-1: Architectural improvement: Runtime type informationGood architectural decision to move from compile-time Literal types to runtime metadata. This change:
- Maintains type information at runtime
- Aligns with the DataPoint base class pattern
- Provides consistent type identification across the system
Consider documenting this architectural decision in the project's ADR (Architecture Decision Records) to explain the rationale for future maintainers.
Also applies to: 15-15, 24-24, 36-36, 48-48, 60-60, 69-69, 79-79, 97-97
cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (1)
7-12: Improve import organizationConsider grouping related imports together and adding a blank line between different groups:
from __future__ import annotations import asyncio import logging from typing import List, Optional from uuid import UUID - -from cognee.infrastructure.engine import DataPoint - -from ..embeddings.EmbeddingEngine import EmbeddingEngine -from ..models.ScoredResult import ScoredResult -from ..vector_db_interface import VectorDBInterface + +from cognee.infrastructure.engine import DataPoint +from ..embeddings.EmbeddingEngine import EmbeddingEngine +from ..models.ScoredResult import ScoredResult +from ..vector_db_interface import VectorDBInterfacecognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py (4)
8-11: Consider grouping related imports togetherThe imports could be better organized by grouping related imports:
- Standard library imports (typing, uuid, etc.)
- Third-party imports (embeddings, models)
- Local imports (exceptions, infrastructure)
from ..embeddings.EmbeddingEngine import EmbeddingEngine -from ..models.ScoredResult import ScoredResult -from ..vector_db_interface import VectorDBInterface +from ..vector_db_interface import VectorDBInterface +from ..models.ScoredResult import ScoredResult
Line range hint
119-134: Enhance error handling in batch operationsThe batch operation error handling could be improved by:
- Adding specific error types
- Providing more context in error messages
- Ensuring proper cleanup in case of partial failures
try: if len(data_points) > 1: with collection.batch.dynamic() as batch: for data_point in data_points: + try: batch.add_object( uuid = data_point.uuid, vector = data_point.vector, properties = data_point.properties, references = data_point.references, ) + except Exception as e: + logger.error("Failed to add data point %s: %s", data_point.uuid, str(e)) + raise else: data_point: DataObject = data_points[0] if collection.data.exists(data_point.uuid):
Line range hint
187-193: Consider adding retry logic for search operationsThe search operation could benefit from retry logic to handle temporary network issues or service unavailability.
+ @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10)) async def search( self, collection_name: str, query_text: Optional[str] = None, query_vector: Optional[List[float]] = None, limit: int = None, with_vector: bool = False ):
Line range hint
234-236: Add parameter validation in batch_searchThe batch_search method should validate its input parameters similar to the single search method.
async def batch_search(self, collection_name: str, query_texts: List[str], limit: int, with_vectors: bool = False): + if not query_texts: + raise InvalidValueError(message="query_texts cannot be empty") query_vectors = await self.embed_data(query_texts)cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2)
Line range hint
91-121: Consider simplifying generic type implementationThe generic type implementation could be simplified while maintaining type safety.
- IdType = TypeVar("IdType") - PayloadSchema = TypeVar("PayloadSchema") - vector_size = self.embedding_engine.get_vector_size() - - class LanceDataPoint(LanceModel, Generic[IdType, PayloadSchema]): + class LanceDataPoint(LanceModel): id: str - vector: Vector(vector_size) + vector: Vector(self.embedding_engine.get_vector_size()) payload: PayloadSchema
Line range hint
142-156: Add error handling for collection operationsThe retrieve operation should include error handling for cases where the collection doesn't exist.
async def retrieve(self, collection_name: str, data_point_ids: list[str]): connection = await self.get_connection() + if not await self.has_collection(collection_name): + raise InvalidValueError(f"Collection {collection_name} does not exist") collection = await connection.open_table(collection_name)cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (3)
Line range hint
28-41: Consider using dataclasses for configurationThe configuration handling could be improved using dataclasses for better type safety and validation.
+from dataclasses import dataclass + +@dataclass +class HnswConfig: + m: int = 16 + ef_construct: int = 100 + def create_hnsw_config(hnsw_config: Dict): if hnsw_config is not None: - return models.HnswConfig() + return models.HnswConfig(**HnswConfig(**hnsw_config).__dict__) return None
Line range hint
95-112: Add connection pooling for better resource managementThe client connection handling could be improved with connection pooling.
Consider implementing a connection pool to manage client connections more efficiently and prevent resource exhaustion during high concurrent loads.
Line range hint
251-253: Improve batch search result filteringThe current filtering approach in batch_search might drop important results.
- return [filter(lambda result: result.score > 0.9, result_group) for result_group in results] + return [list(filter(lambda result: result.score > 0.9, result_group)) for result_group in results]The current implementation:
- Returns filter objects instead of lists
- Uses a hard-coded threshold that might not be suitable for all use cases
- Doesn't provide a way to customize the filtering threshold
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)
Line range hint
82-94: Eliminate duplicate PGVectorDataPoint class definitionThe PGVectorDataPoint class is defined identically in both
create_collectionandcreate_data_pointsmethods. This duplication could lead to maintenance issues if the class definition needs to be updated.Consider moving the class definition to module level or creating a factory method:
+ def create_pgvector_data_point_class(collection_name: str, vector_size: int): + from pgvector.sqlalchemy import Vector + class PGVectorDataPoint(Base): + __tablename__ = collection_name + __table_args__ = {"extend_existing": True} + primary_key: Mapped[int] = mapped_column( + primary_key=True, autoincrement=True + ) + id: Mapped[Any] # Type will be set based on data_points + payload = Column(JSON) + vector = Column(Vector(vector_size)) + + def __init__(self, id, payload, vector): + self.id = id + self.payload = payload + self.vector = vector + return PGVectorDataPoint async def create_collection(self, collection_name: str, payload_schema=None): data_point_types = get_type_hints(DataPoint) vector_size = self.embedding_engine.get_vector_size() if not await self.has_collection(collection_name): - from pgvector.sqlalchemy import Vector - class PGVectorDataPoint(Base): - __tablename__ = collection_name - ... + PGVectorDataPoint = create_pgvector_data_point_class(collection_name, vector_size)Also applies to: 124-136
17-17: Implement similarity score normalization using the imported utilityThe
normalize_distancesutility is imported but not used, while there are TODO comments about normalizing similarity scores.Consider implementing the normalization:
# Extract distances and find min/max for normalization + distances = [vector.similarity for vector in closest_items] + normalized_distances = normalize_distances(distances) for vector in closest_items: - # TODO: Add normalization of similarity score vector_list.append(vector) # Create and return ScoredResult objects return [ ScoredResult( id = UUID(str(row.id)), payload = row.payload, - score = row.similarity + score = normalized_distances[i] ) for i, row in enumerate(vector_list) ]Also applies to: 214-214, 267-267
cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (1)
Line range hint
359-360: Fix missing await in delete_nodes methodThe
delete_data_pointscall is not awaited, which could lead to unhandled promises and race conditions.Apply this fix:
async def delete_nodes(self, collection_name: str, data_point_ids: list[str]): - self.delete_data_points(data_point_ids) + await self.delete_data_points(data_point_ids)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py(1 hunks)cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py(1 hunks)cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py(2 hunks)cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py(1 hunks)cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py(1 hunks)cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py(1 hunks)cognee/infrastructure/engine/models/DataPoint.py(3 hunks)cognee/modules/chunking/models/DocumentChunk.py(2 hunks)cognee/modules/data/processing/document_types/Document.py(1 hunks)cognee/modules/engine/models/Entity.py(1 hunks)cognee/modules/engine/models/EntityType.py(1 hunks)cognee/modules/graph/models/EdgeType.py(1 hunks)cognee/modules/graph/utils/convert_node_to_data_point.py(1 hunks)cognee/shared/CodeGraphEntities.py(1 hunks)cognee/shared/SourceCodeGraph.py(2 hunks)cognee/tasks/storage/index_data_points.py(2 hunks)cognee/tasks/summarization/models.py(2 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_huge_model_test.py(2 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- cognee/modules/engine/models/Entity.py
- cognee/modules/chunking/models/DocumentChunk.py
🔇 Additional comments (15)
cognee/modules/engine/models/EntityType.py (1)
8-12: LGTM! Clean metadata implementation.
The metadata structure properly includes both the type information and index fields, following the standardized pattern across the codebase.
cognee/tasks/summarization/models.py (1)
13-13: LGTM! Consistent type metadata implementation.
The addition of the "type" key to _metadata is consistent with the PR objective and follows the same pattern across both TextSummary and CodeSummary classes.
Let's verify the DataPoint base class implementation:
Also applies to: 24-24
✅ Verification successful
LGTM! The type metadata implementation is consistent with the base class.
The DataPoint base class already includes the "type" key in its _metadata dictionary with a value of "DataPoint", and the TextSummary and CodeSummary classes correctly override this value with their respective class names. The implementation maintains consistency across the inheritance hierarchy.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if DataPoint base class has consistent metadata implementation
ast-grep --pattern 'class DataPoint {
$$$
_metadata: dict = {
$$$
"type": $_
$$$
}
$$$
}'
Length of output: 111
Script:
#!/bin/bash
# Let's try a different approach to find the DataPoint class implementation
ast-grep --pattern 'class DataPoint'
# Also search for any metadata-related code in DataPoint
rg -A 5 "_metadata.*=.*{"
Length of output: 160
Script:
#!/bin/bash
# Let's try a simpler approach to find the DataPoint class and its metadata
rg "_metadata" --type py
# Also search for DataPoint class definition
rg "class DataPoint" --type py
Length of output: 15720
Script:
#!/bin/bash
# Let's examine the DataPoint class implementation
cat cognee/infrastructure/engine/models/DataPoint.py
Length of output: 1524
cognee/shared/CodeGraphEntities.py (1)
9-12: LGTM! Consistent metadata implementation across entities.
The implementation of type metadata is consistent across all classes. The inclusion of index_fields where appropriate (Repository, CodeFile, CodePart) and its omission in CodeRelationship (which doesn't need indexed fields) shows good design consideration.
Let's verify the consistency of index_fields usage across the codebase:
Also applies to: 24-25, 34-35, 42-44
✅ Verification successful
Metadata implementation is indeed consistent across the codebase
The verification confirms that:
- All entities that need indexed fields (Repository, CodeFile, CodePart) correctly include "index_fields" in their metadata
- CodeRelationship correctly omits "index_fields" as it doesn't require indexed fields
- This pattern is consistently followed across the entire codebase, with other entities like TextSummary, CodeSummary, and various database adapters also properly implementing their metadata with appropriate index fields
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any inconsistencies in index_fields usage
ast-grep --pattern 'class $_ {
$$$
_metadata: dict = {
"index_fields": $_,
$$$
}
$$$
}'
Length of output: 105
Script:
#!/bin/bash
# Let's try a different approach to find metadata definitions in Python files
rg -U "_metadata.*=.*{.*index_fields" --type py -A 3
Length of output: 181
Script:
#!/bin/bash
# Search for _metadata dictionary definitions in Python files
rg "_metadata: dict = \{" --type py -A 4
Length of output: 5143
cognee/infrastructure/engine/models/DataPoint.py (1)
18-19: LGTM: Consistent metadata type addition
The addition of "type": "DataPoint" to the _metadata dictionary aligns with the PR objectives and establishes a consistent pattern for type identification across derived classes.
cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py (1)
12-14: LGTM: Consistent type metadata across test classes
The addition of type metadata to all test classes follows the established pattern and matches class names appropriately. This ensures test data accurately reflects the production behavior.
Also applies to: 20-22, 26-28, 33-35
cognee/shared/SourceCodeGraph.py (1)
99-104: LGTM: Circular reference handling maintained
The model_rebuild() calls are correctly maintained for classes with circular references, ensuring proper model initialization.
cognee/tests/unit/interfaces/graph/get_graph_from_huge_model_test.py (3)
14-16: LGTM: Repository metadata type addition
The type metadata accurately reflects the class name.
23-25: LGTM: CodeFile metadata type addition
The type metadata accurately reflects the class name.
30-32: LGTM: CodePart metadata type addition
The type metadata accurately reflects the class name.
cognee/tasks/storage/index_data_points.py (2)
84-86: LGTM: Car metadata type addition
The type metadata accurately reflects the class name.
93-95: LGTM: Person metadata type addition
The type metadata accurately reflects the class name.
cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (1)
21-22: LGTM: IndexSchema metadata type addition
The type metadata accurately reflects the class name while preserving the existing index_fields configuration.
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
24-27: LGTM: Type metadata addition
The addition of the "type" field to metadata is consistent with other adapters and improves type identification.
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
3-20: LGTM! Import organization and metadata changes look good.
The changes improve code organization and implement the PR objective of adding type metadata.
Also applies to: 26-27
cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (1)
5-16: LGTM! Import organization and metadata changes look good.
The changes improve code organization and implement the PR objective of adding type metadata consistently with other adapters.
Also applies to: 22-23
| _metadata: dict = { | ||
| "type": "Document" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Add index_fields to Document metadata for consistency
Looking at the codebase, I can confirm that most DataPoint subclasses, especially those dealing with text or searchable content, include "index_fields" in their metadata. Specifically:
- EntityType has index_fields: ["name"]
- DocumentChunk has index_fields: ["text"]
- Entity has index_fields: ["name"]
- CodeFile/Repository/CodePart have index_fields: ["source_code"]
Since Document is also a searchable entity that contains text content, it should follow this pattern for consistency.
🔗 Analysis chain
Consider adding index_fields to metadata.
Other DataPoint subclasses (e.g., EntityType) include "index_fields" in their metadata. Consider whether Document should also specify its indexed fields for consistency.
Let's check if other DataPoint subclasses consistently use index_fields:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for _metadata declarations with index_fields
ast-grep --pattern '$$ = {
"index_fields": $$$,
$$$
}'
Length of output: 60
Script:
#!/bin/bash
# Let's search for all _metadata declarations in DataPoint subclasses
rg "_metadata.*=.*{" --type py -A 3
Length of output: 131
Script:
#!/bin/bash
# Let's search for _metadata declarations with proper escaping
rg "_metadata.*=.*\{" --type py -A 3
Length of output: 10977
86b6146 to
6a99fc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py (1)
36-39: Consider indexing the EntityType relationship.While indexing the
namefield is appropriate, consider also includingis_typein theindex_fieldslist to optimize queries that filter entities by their type._metadata = { - "index_fields": ["name"], + "index_fields": ["name", "is_type"], "type": "Entity" }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
cognee/modules/data/processing/document_types/Document.py(1 hunks)cognee/shared/SourceCodeGraph.py(2 hunks)cognee/tasks/storage/index_data_points.py(2 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_huge_model_test.py(2 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- cognee/modules/data/processing/document_types/Document.py
- cognee/tasks/storage/index_data_points.py
- cognee/tests/unit/interfaces/graph/get_graph_from_huge_model_test.py
- cognee/shared/SourceCodeGraph.py
🔇 Additional comments (3)
cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py (3)
4-4: LGTM!
The import reordering is acceptable and doesn't affect functionality.
21-24: LGTM!
The metadata structure is well-defined with appropriate indexing on the text field.
28-31: LGTM!
The metadata structure is well-defined with appropriate indexing on the name field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
Line range hint
191-193: Consider implementing similarity score normalization and reducing code duplication.
- The TODO comments indicate missing score normalization. The
normalize_distancesutility is already imported but not used.- There's significant code duplication between
get_distance_from_collection_elementsandsearchmethods.Consider extracting the common functionality into a private method:
async def _search_vectors( self, collection_name: str, query_vector: List[float], limit: Optional[int] = None ) -> List[ScoredResult]: PGVectorDataPoint = await self.get_table(collection_name) async with self.get_async_session() as session: query = select( PGVectorDataPoint, PGVectorDataPoint.c.vector.cosine_distance(query_vector).label("similarity"), ).order_by("similarity") if limit: query = query.limit(limit) results = await session.execute(query) vector_list = [row for row in results] # Normalize similarity scores scores = [v.similarity for v in vector_list] normalized_scores = normalize_distances(scores) return [ ScoredResult( id=UUID(str(row.id)), payload=row.payload, score=norm_score ) for row, norm_score in zip(vector_list, normalized_scores) ]Then update both methods to use this common implementation:
async def get_distance_from_collection_elements( self, collection_name: str, query_text: str = None, query_vector: List[float] = None, with_vector: bool = False ) -> List[ScoredResult]: if query_text is None and query_vector is None: raise ValueError("One of query_text or query_vector must be provided!") if query_text and not query_vector: query_vector = (await self.embedding_engine.embed_text([query_text]))[0] return await self._search_vectors(collection_name, query_vector) async def search( self, collection_name: str, query_text: Optional[str] = None, query_vector: Optional[List[float]] = None, limit: int = 5, with_vector: bool = False, ) -> List[ScoredResult]: if query_text is None and query_vector is None: raise InvalidValueError(message="One of query_text or query_vector must be provided!") if query_text and not query_vector: query_vector = (await self.embedding_engine.embed_text([query_text]))[0] return await self._search_vectors(collection_name, query_vector, limit)Also applies to: 249-251
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py(1 hunks)
🔇 Additional comments (2)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)
3-20: LGTM! Import organization looks good.
The imports are well-organized with system imports followed by relative imports, and all imports are being used in the code.
26-27: LGTM! Type metadata addition is consistent.
The addition of the "type" field to IndexSchema metadata aligns with the PR objective and maintains consistency across different adapters.
9682b5f to
c097aee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
cognee/shared/SourceCodeGraph.py (4)
73-76: Consider using an enum for called_by fieldWhile the metadata changes look good, consider replacing
Literal["main"]with an enum to maintain consistency with the new type system approach.from enum import Enum class CallerType(Enum): MAIN = "main" # Then update the type hint: called_by: Union[Function, CallerType]
84-87: Consider type alias for complex Union typeWhile the implementation is correct, consider creating a type alias for the complex Union type in members for better readability.
from typing import TypeAlias ExpressionMember: TypeAlias = Union[Variable, Function, Operator, "Expression"] # Then use: members: List[ExpressionMember]
103-106: Consider type alias for nodes Union typeSimilar to the previous suggestion, consider creating a type alias for the nodes Union type to improve code readability.
from typing import TypeAlias GraphNode: TypeAlias = Union[ Class, ClassInstance, Function, FunctionCall, Variable, Operator, Expression, ] # Then use: nodes: List[GraphNode]
Line range hint
1-112: Well-structured type system migrationThe migration from Literal types to metadata maintains a clear hierarchy while improving consistency in type handling. The approach aligns well with object-oriented principles and maintains clear separation of concerns.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py(1 hunks)cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py(1 hunks)cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py(2 hunks)cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py(3 hunks)cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py(1 hunks)cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py(1 hunks)cognee/infrastructure/engine/models/DataPoint.py(3 hunks)cognee/modules/chunking/models/DocumentChunk.py(2 hunks)cognee/modules/data/processing/document_types/Document.py(1 hunks)cognee/modules/engine/models/Entity.py(1 hunks)cognee/modules/engine/models/EntityType.py(1 hunks)cognee/modules/graph/models/EdgeType.py(1 hunks)cognee/modules/graph/utils/convert_node_to_data_point.py(1 hunks)cognee/shared/CodeGraphEntities.py(1 hunks)cognee/shared/SourceCodeGraph.py(2 hunks)cognee/tasks/storage/index_data_points.py(2 hunks)cognee/tasks/summarization/models.py(2 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_huge_model_test.py(2 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (18)
- cognee/modules/chunking/models/DocumentChunk.py
- cognee/modules/graph/utils/convert_node_to_data_point.py
- cognee/modules/engine/models/Entity.py
- cognee/modules/graph/models/EdgeType.py
- cognee/modules/engine/models/EntityType.py
- cognee/infrastructure/engine/models/DataPoint.py
- cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py
- cognee/tests/unit/interfaces/graph/get_graph_from_huge_model_test.py
- cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py
- cognee/modules/data/processing/document_types/Document.py
- cognee/tasks/summarization/models.py
- cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py
- cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py
- cognee/shared/CodeGraphEntities.py
- cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py
- cognee/tasks/storage/index_data_points.py
- cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py
- cognee/infrastructure/databases/vector/weaviate_db/WeaviateAdapter.py
🔇 Additional comments (3)
cognee/shared/SourceCodeGraph.py (3)
24-27: LGTM! Consistent metadata structure
The metadata structure is consistently implemented across these classes with appropriate index fields for searchable entities.
Also applies to: 38-40, 51-53, 64-66
14-17: Verify type attribute usage in dependent code
The migration from a type Literal to metadata dictionary is a breaking change that could affect code that directly accesses the type attribute.
✅ Verification successful
Migration to metadata dictionary is safe to proceed
The type attribute usage in the codebase is safe for this migration as:
- The
.typeaccesses inextract_code_parts.pyandget_local_dependencies.pyare for AST node types, not for our DataPoint types. - The
.typeusage in graph utils is for node type generation and naming, which will continue to work as the type information is preserved in the metadata dictionary.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any direct access to .type attribute that might break
rg -l '\.type\s*[=!]=\s*["\']Variable["\']' || \
rg -l '\.type\s*[=!]=\s*["\']Class["\']' || \
rg -l '\.type\s*[=!]=\s*["\']Function["\']'
Length of output: 116
Script:
#!/bin/bash
# Search for direct access to type attribute with proper quote handling
rg -l "\.type\s*[=!]=\s*['\"]Variable['\"]" || \
rg -l "\.type\s*[=!]=\s*['\"]Class['\"]" || \
rg -l "\.type\s*[=!]=\s*['\"]Function['\"]"
# Also search for type attribute access in method calls
rg -l "\.type\s+in\s+|\.type\s+==\s+|\.type\s+!=\s+"
# Search for any other potential type attribute usage
ast-grep --pattern 'node.type'
Length of output: 684
Script:
#!/bin/bash
# Check the specific usage in the identified files
rg -A 5 "\.type" cognee/tasks/repo_processor/extract_code_parts.py cognee/tasks/repo_processor/get_local_dependencies.py
# Check if these files are accessing metadata instead
rg -A 5 "_metadata" cognee/tasks/repo_processor/extract_code_parts.py cognee/tasks/repo_processor/get_local_dependencies.py
# Check the node type usage in graph utils
cat cognee/modules/graph/utils/retrieve_existing_edges.py
cat cognee/modules/graph/utils/expand_with_nodes_and_edges.py
Length of output: 7051
108-112: Verify model_rebuild coverage
Some classes (Variable, Operator) don't have model_rebuild calls. Please verify if this is intentional.
✅ Verification successful
Let me gather more information about the model_rebuild implementations and their usage patterns.
Let me search for the inheritance hierarchy and model definitions to understand if Variable and Operator are meant to have model_rebuild.
Based on the gathered information, I can now provide the final response:
model_rebuild() calls are correctly implemented
The absence of model_rebuild() calls for Variable and Operator classes is correct because:
- Both Variable and Operator inherit from DataPoint, which is a base class for data entities
- The model_rebuild() is only called after create_model() operations for dynamic model modifications
- The current calls in SourceCodeGraph.py are specifically for classes that need dynamic model rebuilding
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for model_rebuild implementation to understand its purpose
rg -A 5 'def model_rebuild'
# Search for other usage patterns of model_rebuild
rg -l '\.model_rebuild\(\)'
Length of output: 362
Script:
#!/bin/bash
# Search for class definitions and their model_rebuild methods
ast-grep --pattern 'class $_ {
$$$
def model_rebuild($$$) {
$$$
}
$$$
}'
# Search for Variable and Operator class definitions
ast-grep --pattern 'class Variable'
ast-grep --pattern 'class Operator'
# Get more context around model_rebuild usage
rg -B 2 -A 2 '\.model_rebuild\(\)'
Length of output: 2452
Script:
#!/bin/bash
# Search for BaseModel inheritance and model definitions
rg -B 2 -A 2 'class.*\(.*BaseModel.*\)'
# Search specifically for Variable and Operator class definitions with more context
rg -B 2 -A 5 'class (Variable|Operator)'
# Look for model_rebuild usage in model creation/initialization
rg -B 2 -A 5 'create_model'
Length of output: 17769
borisarzentar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job with this implementation!
* feat: Add error handling in case user is already part of database and permission already given to group Added error handling in case permission is already given to group and user is already part of group Feature COG-656 * feat: Add user verification for accessing data Verify user has access to data before returning it Feature COG-656 * feat: Add compute search to cognee Add compute search to cognee which makes searches human readable Feature COG-656 * feat: Add simple instruction for system prompt Add simple instruction for system prompt Feature COG-656 * pass pydantic model tocognify * feat: Add unauth access error to getting data Raise unauth access error when trying to read data without access Feature COG-656 * refactor: Rename query compute to query completion Rename searching type from compute to completion Refactor COG-656 * chore: Update typo in code Update typo in string in code Chore COG-656 * Add mcp to cognee * Add simple README * Update cognee-mcp/mcpcognee/__main__.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Create dockerhub.yml * Update get_cognify_router.py * fix: Resolve reflection issue when running cognee a second time after pruning data When running cognee a second time after pruning data some metadata doesn't get pruned. This makes cognee believe some tables exist that have been deleted Fix * fix: Add metadata reflection fix to sqlite as well Added fix when reflecting metadata to sqlite as well Fix * update * Revert "fix: Add metadata reflection fix to sqlite as well" This reverts commit 394a0b2. * COG-810 Implement a top-down dependency graph builder tool (#268) * feat: parse repo to call graph * Update/repo_processor/top_down_repo_parse.py task * fix: minor improvements * feat: file parsing jedi script optimisation --------- * Add type to DataPoint metadata (#364) * Add type to DataPoint metadata * Add missing index_fields * Use DataPoint UUID type in pgvector create_data_points * Make _metadata mandatory everywhere * Fixes * Fixes to our demo * feat: Add search by dataset for cognee Added ability to search by datasets for cognee users Feature COG-912 * feat: outsources chunking parameters to extract chunk from documents … (#289) * feat: outsources chunking parameters to extract chunk from documents task * fix: Remove backend lock from UI Removed lock that prevented using multiple datasets in cognify Fix COG-912 * COG 870 Remove duplicate edges from the code graph (#293) * feat: turn summarize_code into generator * feat: extract run_code_graph_pipeline, update the pipeline * feat: minimal code graph example * refactor: update argument * refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline * refactor: indentation and whitespace nits * refactor: add deprecated use comments and warnings --------- Co-authored-by: Vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Boris <[email protected]> * test: Added test for getting of documents for search Added test to verify getting documents related to datasets intended for search Test COG-912 * Structured code summarization (#375) * feat: turn summarize_code into generator * feat: extract run_code_graph_pipeline, update the pipeline * feat: minimal code graph example * refactor: update argument * refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline * refactor: indentation and whitespace nits * refactor: add deprecated use comments and warnings * Structured code summarization * add missing prompt file * Remove summarization_model argument from summarize_code and fix typehinting * minor refactors --------- Co-authored-by: lxobr <[email protected]> Co-authored-by: Vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Boris <[email protected]> * fix: Resolve issue with cognify router graph model default value Resolve issue with default value for graph model in cognify endpoint Fix * chore: Resolve typo in getting documents code Resolve typo in code chore COG-912 * Update .github/workflows/dockerhub.yml Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .github/workflows/dockerhub.yml * Update .github/workflows/dockerhub.yml * Update .github/workflows/dockerhub.yml * Update get_cognify_router.py * fix: Resolve syntax issue with cognify router Resolve syntax issue with cognify router Fix * feat: Add ruff pre-commit hook for linting and formatting Added formatting and linting on pre-commit hook Feature COG-650 * chore: Update ruff lint options in pyproject file Update ruff lint options in pyproject file Chore * test: Add ruff linter github action Added linting check with ruff in github actions Test COG-650 * feat: deletes executor limit from get_repo_file_dependencies * feat: implements mock feature in LiteLLM engine * refactor: Remove changes to cognify router Remove changes to cognify router Refactor COG-650 * fix: fixing boolean env for github actions * test: Add test for ruff format for cognee code Test if code is formatted for cognee Test COG-650 * refactor: Rename ruff gh actions Rename ruff gh actions to be more understandable Refactor COG-650 * chore: Remove checking of ruff lint and format on push Remove checking of ruff lint and format on push Chore COG-650 * feat: Add deletion of local files when deleting data Delete local files when deleting data from cognee Feature COG-475 * fix: changes back the max workers to 12 * feat: Adds mock summary for codegraph pipeline * refacotr: Add current development status Save current development status Refactor * Fix langfuse * Fix langfuse * Fix langfuse * Add evaluation notebook * Rename eval notebook * chore: Add temporary state of development Add temp development state to branch Chore * fix: Add poetry.lock file, make langfuse mandatory Added langfuse as mandatory dependency, added poetry.lock file Fix * Fix: fixes langfuse config settings * feat: Add deletion of local files made by cognee through data endpoint Delete local files made by cognee when deleting data from database through endpoint Feature COG-475 * test: Revert changes on test_pgvector Revert changes on test_pgvector which were made to test deletion of local files Test COG-475 * chore: deletes the old test for the codegraph pipeline * test: Add test to verify deletion of local files Added test that checks local files created by cognee will be deleted and those not created by cognee won't Test COG-475 * chore: deletes unused old version of the codegraph * chore: deletes unused imports from code_graph_pipeline * Ingest non-code files * Fixing review findings * Ingest non-code files (#395) * Ingest non-code files * Fixing review findings * test: Update test regarding message Update assertion message, add veryfing of file existence * Handle retryerrors in code summary (#396) * Handle retryerrors in code summary * Log instead of print * fix: updates the acreate_structured_output * chore: Add logging to sentry when file which should exist can't be found Log to sentry that a file which should exist can't be found Chore COG-475 * Fix diagram * fix: refactor mcp * Add Smithery CLI installation instructions and badge * Move readme * Update README.md * Update README.md * Cog 813 source code chunks (#383) * fix: pass the list of all CodeFiles to enrichment task * feat: introduce SourceCodeChunk, update metadata * feat: get_source_code_chunks code graph pipeline task * feat: integrate get_source_code_chunks task, comment out summarize_code * Fix code summarization (#387) * feat: update data models * feat: naive parse long strings in source code * fix: get_non_py_files instead of get_non_code_files * fix: limit recursion, add comment * handle embedding empty input error (#398) * feat: robustly handle CodeFile source code * refactor: sort imports * todo: add support for other embedding models * feat: add custom logger * feat: add robustness to get_source_code_chunks Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat: improve embedding exceptions * refactor: format indents, rename module --------- Co-authored-by: alekszievr <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Fix diagram * Fix instructions * adding and fixing files * Update README.md * ruff format * Fix linter issues * Implement PR review * Comment out profiling * fix: add allowed extensions * fix: adhere UnstructuredDocument.read() to Document * feat: time code graph run and add mock support * Fix ollama, work on visualization * fix: Fixes faulty logging format and sets up error logging in dynamic steps example * Overcome ContextWindowExceededError by checking token count while chunking (#413) * fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints * Adjust AudioDocument and handle None token limit * Handle azure models as well * Add clean logging to code graph example * Remove setting envvars from arg * fix: fixes create_cognee_style_network_with_logo unit test * fix: removes accidental remained print * Get embedding engine instead of passing it. Get it from vector engine instead of direct getter. * Fix visualization * Get embedding engine instead of passing it in code chunking. * Fix poetry issues * chore: Update version of poetry install action * chore: Update action to trigger on pull request for any branch * chore: Remove if in github action to allow triggering on push * chore: Remove if condition to allow gh actions to trigger on push to PR * chore: Update poetry version in github actions * chore: Set fixed ubuntu version to 22.04 * chore: Update py lint to use ubuntu 22.04 * chore: update ubuntu version to 22.04 * feat: implements the first version of graph based completion in search * chore: Update python 3.9 gh action to use 3.12 instead * chore: Update formatting of utils.py * Fix poetry issues * Adjust integration tests * fix: Fixes ruff formatting * Handle circular import * fix: Resolve profiler issue with partial and recursive logger imports Resolve issue for profiler with partial and recursive logger imports * fix: Remove logger from __init__.py file * test: Test profiling on HEAD branch * test: Return profiler to base branch * Set max_tokens in config * Adjust SWE-bench script to code graph pipeline call * Adjust SWE-bench script to code graph pipeline call * fix: Add fix for accessing dictionary elements that don't exits Using get for the text key instead of direct access to handle situation if the text key doesn't exist * feat: Add ability to change graph database configuration through cognee * feat: adds pydantic types to graph layer models * feat: adds basic retriever for swe bench * Match Ruff version in config to the one in github actions * feat: implements code retreiver * Fix: fixes unit test for codepart search * Format with Ruff 0.9.0 * Fix: deleting incorrect repo path * fix: resolve issue with langfuse dependency installation when integrating cognee in different packages * version: Increase version to 0.1.21 --------- Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Rita Aleksziev <[email protected]> Co-authored-by: vasilije <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: lxobr <[email protected]> Co-authored-by: alekszievr <[email protected]> Co-authored-by: hajdul88 <[email protected]> Co-authored-by: Henry Mao <[email protected]>
* Revert "fix: Add metadata reflection fix to sqlite as well" This reverts commit 394a0b2. * COG-810 Implement a top-down dependency graph builder tool (#268) * feat: parse repo to call graph * Update/repo_processor/top_down_repo_parse.py task * fix: minor improvements * feat: file parsing jedi script optimisation --------- * Add type to DataPoint metadata (#364) * Add missing index_fields * Use DataPoint UUID type in pgvector create_data_points * Make _metadata mandatory everywhere * feat: Add search by dataset for cognee Added ability to search by datasets for cognee users Feature COG-912 * feat: outsources chunking parameters to extract chunk from documents … (#289) * feat: outsources chunking parameters to extract chunk from documents task * fix: Remove backend lock from UI Removed lock that prevented using multiple datasets in cognify Fix COG-912 * COG 870 Remove duplicate edges from the code graph (#293) * feat: turn summarize_code into generator * feat: extract run_code_graph_pipeline, update the pipeline * feat: minimal code graph example * refactor: update argument * refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline * refactor: indentation and whitespace nits * refactor: add deprecated use comments and warnings --------- Co-authored-by: Vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Boris <[email protected]> * test: Added test for getting of documents for search Added test to verify getting documents related to datasets intended for search Test COG-912 * Structured code summarization (#375) * feat: turn summarize_code into generator * feat: extract run_code_graph_pipeline, update the pipeline * feat: minimal code graph example * refactor: update argument * refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline * refactor: indentation and whitespace nits * refactor: add deprecated use comments and warnings * Structured code summarization * add missing prompt file * Remove summarization_model argument from summarize_code and fix typehinting * minor refactors --------- Co-authored-by: lxobr <[email protected]> Co-authored-by: Vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Boris <[email protected]> * fix: Resolve issue with cognify router graph model default value Resolve issue with default value for graph model in cognify endpoint Fix * chore: Resolve typo in getting documents code Resolve typo in code chore COG-912 * Update .github/workflows/dockerhub.yml Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .github/workflows/dockerhub.yml * Update .github/workflows/dockerhub.yml * Update .github/workflows/dockerhub.yml * Update get_cognify_router.py * fix: Resolve syntax issue with cognify router Resolve syntax issue with cognify router Fix * feat: Add ruff pre-commit hook for linting and formatting Added formatting and linting on pre-commit hook Feature COG-650 * chore: Update ruff lint options in pyproject file Update ruff lint options in pyproject file Chore * test: Add ruff linter github action Added linting check with ruff in github actions Test COG-650 * feat: deletes executor limit from get_repo_file_dependencies * feat: implements mock feature in LiteLLM engine * refactor: Remove changes to cognify router Remove changes to cognify router Refactor COG-650 * fix: fixing boolean env for github actions * test: Add test for ruff format for cognee code Test if code is formatted for cognee Test COG-650 * refactor: Rename ruff gh actions Rename ruff gh actions to be more understandable Refactor COG-650 * chore: Remove checking of ruff lint and format on push Remove checking of ruff lint and format on push Chore COG-650 * feat: Add deletion of local files when deleting data Delete local files when deleting data from cognee Feature COG-475 * fix: changes back the max workers to 12 * feat: Adds mock summary for codegraph pipeline * refacotr: Add current development status Save current development status Refactor * Fix langfuse * Fix langfuse * Fix langfuse * Add evaluation notebook * Rename eval notebook * chore: Add temporary state of development Add temp development state to branch Chore * fix: Add poetry.lock file, make langfuse mandatory Added langfuse as mandatory dependency, added poetry.lock file Fix * Fix: fixes langfuse config settings * feat: Add deletion of local files made by cognee through data endpoint Delete local files made by cognee when deleting data from database through endpoint Feature COG-475 * test: Revert changes on test_pgvector Revert changes on test_pgvector which were made to test deletion of local files Test COG-475 * chore: deletes the old test for the codegraph pipeline * test: Add test to verify deletion of local files Added test that checks local files created by cognee will be deleted and those not created by cognee won't Test COG-475 * chore: deletes unused old version of the codegraph * chore: deletes unused imports from code_graph_pipeline * Ingest non-code files * Fixing review findings * Ingest non-code files (#395) * Ingest non-code files * Fixing review findings * test: Update test regarding message Update assertion message, add veryfing of file existence * Handle retryerrors in code summary (#396) * Handle retryerrors in code summary * Log instead of print * fix: updates the acreate_structured_output * chore: Add logging to sentry when file which should exist can't be found Log to sentry that a file which should exist can't be found Chore COG-475 * Fix diagram * fix: refactor mcp * Add Smithery CLI installation instructions and badge * Move readme * Update README.md * Update README.md * Cog 813 source code chunks (#383) * fix: pass the list of all CodeFiles to enrichment task * feat: introduce SourceCodeChunk, update metadata * feat: get_source_code_chunks code graph pipeline task * feat: integrate get_source_code_chunks task, comment out summarize_code * Fix code summarization (#387) * feat: update data models * feat: naive parse long strings in source code * fix: get_non_py_files instead of get_non_code_files * fix: limit recursion, add comment * handle embedding empty input error (#398) * feat: robustly handle CodeFile source code * refactor: sort imports * todo: add support for other embedding models * feat: add custom logger * feat: add robustness to get_source_code_chunks Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat: improve embedding exceptions * refactor: format indents, rename module --------- Co-authored-by: alekszievr <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Fix diagram * Fix diagram * Fix instructions * Fix instructions * adding and fixing files * Update README.md * ruff format * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Implement PR review * Comment out profiling * Comment out profiling * Comment out profiling * fix: add allowed extensions * fix: adhere UnstructuredDocument.read() to Document * feat: time code graph run and add mock support * Fix ollama, work on visualization * fix: Fixes faulty logging format and sets up error logging in dynamic steps example * Overcome ContextWindowExceededError by checking token count while chunking (#413) * fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints * Adjust AudioDocument and handle None token limit * Handle azure models as well * Fix visualization * Fix visualization * Fix visualization * Add clean logging to code graph example * Remove setting envvars from arg * fix: fixes create_cognee_style_network_with_logo unit test * fix: removes accidental remained print * Fix visualization * Fix visualization * Fix visualization * Get embedding engine instead of passing it. Get it from vector engine instead of direct getter. * Fix visualization * Fix visualization * Fix poetry issues * Get embedding engine instead of passing it in code chunking. * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * chore: Update version of poetry install action * chore: Update action to trigger on pull request for any branch * chore: Remove if in github action to allow triggering on push * chore: Remove if condition to allow gh actions to trigger on push to PR * chore: Update poetry version in github actions * chore: Set fixed ubuntu version to 22.04 * chore: Update py lint to use ubuntu 22.04 * chore: update ubuntu version to 22.04 * feat: implements the first version of graph based completion in search * chore: Update python 3.9 gh action to use 3.12 instead * chore: Update formatting of utils.py * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Adjust integration tests * fix: Fixes ruff formatting * Handle circular import * fix: Resolve profiler issue with partial and recursive logger imports Resolve issue for profiler with partial and recursive logger imports * fix: Remove logger from __init__.py file * test: Test profiling on HEAD branch * test: Return profiler to base branch * Set max_tokens in config * Adjust SWE-bench script to code graph pipeline call * Adjust SWE-bench script to code graph pipeline call * fix: Add fix for accessing dictionary elements that don't exits Using get for the text key instead of direct access to handle situation if the text key doesn't exist * feat: Add ability to change graph database configuration through cognee * feat: adds pydantic types to graph layer models * test: Test ubuntu 24.04 * test: change all actions to ubuntu-latest * feat: adds basic retriever for swe bench * Match Ruff version in config to the one in github actions * feat: implements code retreiver * Fix: fixes unit test for codepart search * Format with Ruff 0.9.0 * Fix: deleting incorrect repo path * docs: Add LlamaIndex Cognee integration notebook Added LlamaIndex Cognee integration notebook * test: Add github action for testing llama index cognee integration notebook * fix: resolve issue with langfuse dependency installation when integrating cognee in different packages * version: Increase version to 0.1.21 * fix: update dependencies of the mcp server * Update README.md * Fix: Fixes logging setup * feat: deletes on the fly embeddings as uses edge collections * fix: Change nbformat on llama index integration notebook * fix: Resolve api key issue with llama index integration notebook * fix: Attempt to resolve issue with Ubuntu 24.04 segmentation fault * version: Increase version to 0.1.22 --------- Co-authored-by: vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: lxobr <[email protected]> Co-authored-by: alekszievr <[email protected]> Co-authored-by: hajdul88 <[email protected]> Co-authored-by: Vasilije <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Rita Aleksziev <[email protected]> Co-authored-by: Henry Mao <[email protected]>
Summary by CodeRabbit
Release Notes
New Features
_metadataattributes to multiple classes, enhancing data structure with type information.CarandPersonwith metadata for improved data modeling.searchmethods across various adapters to include a default limit parameter.Bug Fixes
Documentation
Refactor