Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
458eeac
Count the number of tokens in documents
alekszievr Jan 28, 2025
51eadef
Merge branch 'COG-970-refactor-tokenizing' into feat/cog-1071-input-t…
alekszievr Jan 28, 2025
ba608a4
Merge branch 'COG-970-refactor-tokenizing' into feat/cog-1071-input-t…
alekszievr Jan 28, 2025
f6663ab
save token count to relational db
alekszievr Jan 28, 2025
9182be8
Merge branch 'COG-970-refactor-tokenizing' into feat/cog-1132-add-num…
alekszievr Jan 28, 2025
72dfec4
Add metrics to metric table
alekszievr Jan 28, 2025
9bd5917
Merge branch 'dev' into feat/cog-1071-input-token-counting
dexters1 Jan 29, 2025
227d94e
Merge branch 'feat/cog-1071-input-token-counting' into feat/cog-1132-…
alekszievr Jan 29, 2025
22b6459
Store list as json instead of array in relational db table
alekszievr Jan 29, 2025
9764441
Merge branch 'dev' into feat/cog-1132-add-num-tokens-to-metric-table
alekszievr Jan 29, 2025
100e7d7
Sum in sql instead of python
alekszievr Jan 29, 2025
c182d47
Unify naming
alekszievr Jan 29, 2025
44fa2cd
Return data_points in descriptive metric calculation task
alekszievr Jan 29, 2025
06030ff
Graph metrics getter template in graph db interface and adapters
alekszievr Jan 29, 2025
67d9908
Calculate descriptive metrics in networkx adapter
alekszievr Jan 29, 2025
252ac7f
neo4j metrics
alekszievr Jan 29, 2025
48a51a3
Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface
alekszievr Jan 30, 2025
9a94db8
remove _table from table name
alekszievr Jan 30, 2025
57fb338
Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface
alekszievr Jan 31, 2025
e8dcef1
Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface
alekszievr Feb 1, 2025
b0f6ba7
Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface
alekszievr Feb 3, 2025
05138fa
Use modules for adding to db instead of infrastructure
alekszievr Feb 3, 2025
f064f52
Merge branch 'feat/cog-1082-metrics-in-graphdb-interface' into feat/c…
alekszievr Feb 3, 2025
c9ee1bc
Merge branch 'feat/cog-1082-metrics-in-networkx-adapter' into feat/co…
alekszievr Feb 3, 2025
af8e798
Merge branch 'dev' into feat/cog-1082-metrics-in-networkx-adapter
alekszievr Feb 3, 2025
406057f
Merge branch 'feat/cog-1082-metrics-in-networkx-adapter' into feat/co…
alekszievr Feb 3, 2025
d93b5f5
minor fixes
alekszievr Feb 3, 2025
c13fdec
minor cleanup
alekszievr Feb 3, 2025
f2ad1d4
Merge branch 'dev' into feat/cog-1082-metrics-in-neo4j-adapter
alekszievr Feb 3, 2025
3e67828
Remove graph metric calculation from the default cognify pipeline
alekszievr Feb 4, 2025
58e5275
Merge branch 'dev' into feat/cog-1082-metrics-in-neo4j-adapter
alekszievr Feb 4, 2025
dc06b50
Merge branch 'dev' into feat/cog-1082-metrics-in-neo4j-adapter
alekszievr Feb 5, 2025
91b42ab
Merge branch 'dev' into feat/cog-1082-metrics-in-neo4j-adapter
alekszievr Feb 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge branch 'COG-970-refactor-tokenizing' into feat/cog-1071-input-t…
…oken-counting
  • Loading branch information
alekszievr committed Jan 28, 2025
commit 51eadefeab72a41be93e3e38c6c8194e459c0ce9
3 changes: 2 additions & 1 deletion cognee/modules/data/models/Data.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from datetime import datetime, timezone
from uuid import uuid4
from sqlalchemy import UUID, Column, DateTime, String, JSON
from sqlalchemy import UUID, Column, DateTime, String, JSON, Integer
from sqlalchemy.orm import relationship

from cognee.infrastructure.databases.relational import Base
Expand All @@ -20,6 +20,7 @@ class Data(Base):
owner_id = Column(UUID, index=True)
content_hash = Column(String)
external_metadata = Column(JSON)
token_count = Column(Integer)
created_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
updated_at = Column(DateTime(timezone=True), onupdate=lambda: datetime.now(timezone.utc))

Expand Down
4 changes: 1 addition & 3 deletions cognee/tasks/documents/extract_chunks_from_documents.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,7 @@ async def extract_chunks_from_documents(
"""
for document in documents:
document_token_count = 0
for document_chunk in document.read(
chunk_size=chunk_size, chunker=chunker, max_tokens=max_tokens
):
for document_chunk in document.read(chunk_size=chunk_size, chunker=chunker):
document_token_count += document_chunk.token_count
yield document_chunk
document.token_count = document_token_count
1 change: 1 addition & 0 deletions cognee/tasks/ingestion/ingest_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ async def data_storing(data: Any, dataset_name: str, user: User):
owner_id=user.id,
content_hash=file_metadata["content_hash"],
external_metadata=get_external_metadata_dict(data_item),
token_count=-1,
)

# Check if data is already in dataset
Expand Down
You are viewing a condensed version of this merge commit. You can view the full changes here.