-
Notifications
You must be signed in to change notification settings - Fork 966
Fix gemini gh action #748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gemini gh action #748
Conversation
Please make sure all the checkboxes are checked:
|
WalkthroughThis update introduces substantial enhancements and refactoring across the codebase. Major changes include the introduction of robust rate limiting and retry decorators for both LLM and embedding API calls, with new configuration options and singleton limiter classes. The pipeline task execution system is refactored, replacing the old Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant API
participant LLMAdapter
participant RateLimiter
participant EmbeddingEngine
participant EmbeddingRateLimiter
User->>API: Submit LLM or Embedding request
API->>LLMAdapter: Call LLM method (decorated)
LLMAdapter->>RateLimiter: Check/wait rate limit
RateLimiter-->>LLMAdapter: Allow or delay
LLMAdapter->>LLMAdapter: Retry on rate limit error (decorator)
LLMAdapter-->>API: Return LLM result
API->>EmbeddingEngine: Call embed_text (decorated)
EmbeddingEngine->>EmbeddingRateLimiter: Check/wait rate limit
EmbeddingRateLimiter-->>EmbeddingEngine: Allow or delay
EmbeddingEngine->>EmbeddingEngine: Retry on rate limit error (decorator)
EmbeddingEngine-->>API: Return embeddings
sequenceDiagram
participant User
participant Pipeline
participant Task
participant Telemetry
User->>Pipeline: Start pipeline with tasks
Pipeline->>Task: Execute (async, sync, generator, etc.)
Task->>Telemetry: Send start event
Task->>Task: Run task logic (yield results)
Task->>Telemetry: Send complete/error event
Task-->>Pipeline: Yield results (batched if needed)
Pipeline->>Task: Execute next task with results
Pipeline-->>User: Stream results
Possibly related PRs
Poem
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 5
🔭 Outside diff range comments (1)
cognee/tests/unit/entity_extraction/regex_entity_extraction_test.py (1)
227-227:⚠️ Potential issuePotential inconsistency with disabled money extraction test
This line asserts that "MONEY" entity type is present in the extracted entities, but the specific test for money extraction is currently disabled because the regex is failing.
Either:
- Fix the money regex issue
- Temporarily remove this assertion
- Verify if this test is somehow still passing with the current implementation
You might want to run this test in isolation to check if it's actually passing or if it's being skipped:
pytest cognee/tests/unit/entity_extraction/regex_entity_extraction_test.py::test_extract_multiple_entity_types -v
🧹 Nitpick comments (50)
cognee/tasks/graph/extract_graph_from_data.py (1)
2-2: Remove unused importThe
Optionaltype is imported but not used anywhere in this file.-from typing import Type, List, Optional +from typing import Type, List🧰 Tools
🪛 Ruff (0.8.2)
2-2:
typing.Optionalimported but unusedRemove unused import:
typing.Optional(F401)
cognee/base_config.py (1)
17-18: Good addition of configurable default user credentialsAdding environment variable-based configuration for default user credentials makes the system more flexible and easier to configure across different environments.
Consider updating the
to_dict()method to include these new fields if they should be part of the serialized configuration (or explicitly document why they're excluded if that's intentional)..env.template (1)
4-6: Environment variables for default user configuration look goodThe template provides empty placeholders for the new default user configuration variables, which is consistent with other sections of the template.
You might consider adding a brief comment explaining the purpose of these variables and whether they're required or optional.
cognee/modules/data/extraction/knowledge_graph/extract_content_graph.py (1)
1-1: Remove unused import.The
Optionaltype is imported but not used anywhere in the file.-from typing import Type, Optional +from typing import Type🧰 Tools
🪛 Ruff (0.8.2)
1-1:
typing.Optionalimported but unusedRemove unused import:
typing.Optional(F401)
cognee/eval_framework/corpus_builder/run_corpus_builder.py (1)
3-3: Remove unused import.The
Optionaltype is imported but not used anywhere in the file.-from typing import List, Optional +from typing import List🧰 Tools
🪛 Ruff (0.8.2)
3-3:
typing.Optionalimported but unusedRemove unused import:
typing.Optional(F401)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
130-137: Consider batching queries to reduce overhead.
You perform an individual query for each data point ID. For large input lists, this results in multiple synchronous calls and can degrade performance. An alternative is to retrieve all matching records at once, then update or insert them as needed.cognee/tests/test_cognee_server_start.py (1)
14-28: Good server setup approach with improvement opportunities.The server setup in a separate process is a good approach for integration testing. However, consider two improvements:
- For testing environments, binding to
127.0.0.1instead of0.0.0.0would be more secure.- The
preexec_fn=os.setsidproperly sets up a process group for clean termination.- "--host", - "0.0.0.0", + "--host", + "127.0.0.1",cognee/tests/unit/infrastructure/mock_embedding_engine.py (1)
37-55: Well-implemented embedding method with configurable behavior.The implementation correctly applies the rate limiting and retry decorators while supporting configurable delays and failures for resilience testing. One minor improvement would be to add slight variability to the embeddings to better simulate real-world conditions.
Consider adding some randomness to the generated embeddings:
- # Return mock embeddings of the correct dimension - return [[0.1] * self.dimensions for _ in text] + # Return mock embeddings with slight randomness for more realistic testing + import random + return [ + [0.1 + (random.random() - 0.5) * 0.02 for _ in range(self.dimensions)] + for _ in text + ]cognee/infrastructure/llm/prompts/generate_graph_prompt_guided.txt (1)
1-77: Comprehensive and well-structured knowledge graph extraction prompt.The prompt provides detailed guidelines with excellent organization into sections (Node Guidelines, Property & Data Guidelines, etc.). The instructions are specific and include clear examples for proper formatting of node IDs, properties, dates, and relationships.
Consider adding a complete example of the expected output format at the end of the prompt to provide a clearer template for the LLM to follow. For example:
**Expected Output Format Example**: ```json { "nodes": [ { "id": "Marie Curie", "label": "Person", "properties": { "birth_date": "1867-11-07", "birth_place": "Warsaw", "field": "Physics, Chemistry" } }, { "id": "Radioactivity", "label": "Concept", "properties": { "discovery_date": "1896" } } ], "edges": [ { "source": "Marie Curie", "target": "Radioactivity", "label": "researched" }, { "source": "Radioactivity", "target": "Marie Curie", "label": "discovered_by" } ] }</blockquote></details> <details> <summary>cognee/infrastructure/llm/prompts/generate_graph_prompt_simple.txt (1)</summary><blockquote> `1-28`: **Concise and clear knowledge graph extraction prompt.** This simplified prompt maintains the essential guidelines while being more concise. The rules are clear and emphasize the most critical aspects of knowledge graph construction. The consistency with the more detailed guided prompt is good for maintaining standardized output formats. Consider these two enhancements: 1. Add more detail on edge directionality to ensure proper semantic relationships: ```diff + - Edges must have logical direction: from source to target + - Example: "Marie Curie" —[born_in]→ "Warsaw" (not "Warsaw" —[birth_place_of]→ "Marie Curie")
- Add a concrete example of the complete expected output format to provide clearer guidance:
+**Example Output**: +```json +{ + "nodes": [ + {"id": "Albert Einstein", "label": "Person", "properties": {"birth_date": "1879-03-14"}}, + {"id": "Theory of Relativity", "label": "Concept", "properties": {}} + ], + "edges": [ + {"source": "Albert Einstein", "target": "Theory of Relativity", "label": "developed"} + ] +} +```cognee/tests/unit/infrastructure/databases/test_rate_limiter.py (1)
4-5: Remove unused importThe
timemodule is imported but not used in this file.import asyncio -import time from unittest.mock import patch🧰 Tools
🪛 Ruff (0.8.2)
4-4:
asyncioimported but unusedRemove unused import:
asyncio(F401)
5-5:
timeimported but unusedRemove unused import:
time(F401)
cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
61-80: Good refactoring of retry logic using decoratorsThe addition of the
@embedding_sleep_and_retry_async()decorator and simplification of the_get_embeddingmethod is a good refactoring that centralizes retry logic and improves code maintainability.Consider combining the nested
withstatements for better readability:- async with aiohttp.ClientSession() as session: - async with session.post( - self.endpoint, json=payload, headers=headers, timeout=60.0 - ) as response: - data = await response.json() - return data["embedding"] + async with aiohttp.ClientSession() as session, session.post( + self.endpoint, json=payload, headers=headers, timeout=60.0 + ) as response: + data = await response.json() + return data["embedding"]🧰 Tools
🪛 Ruff (0.8.2)
75-78: Use a single
withstatement with multiple contexts instead of nestedwithstatementsCombine
withstatements(SIM117)
evals/README.md (1)
1-93: Comprehensive evaluation documentationThis README provides clear instructions for reproducing the evaluation process and interpreting results. The documentation is well-structured with setup instructions, execution steps, and limitations of the approach.
Minor style improvements:
- Line 71: Consider changing "In order to ensure" to "To ensure" for conciseness
- Line 87: "labor-intensive" should be hyphenated
🧰 Tools
🪛 LanguageTool
[style] ~70-~70: Consider a shorter alternative to avoid wordiness.
Context: ...ect_llm.json` #### Human Evaluation In order to ensure the highest possible accuracy of...(IN_ORDER_TO_PREMIUM)
[misspelling] ~87-~87: This word is normally spelled with a hyphen.
Context: ...memory evaluation - Human as a judge is labor intensive and does not scale - Hotpot is not the ...(EN_COMPOUNDS_LABOR_INTENSIVE)
cognee/modules/pipelines/tasks/task.py (1)
52-66: Consider exception handling or logging inexecute_async_generator.In production scenarios, one failing iteration could disrupt the entire async generator. Consider adding a try/except block and a logging statement to capture errors gracefully (or escalate them), ensuring partial results are not silently discarded.
cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py (5)
4-4: Remove unused import.
lru_cacheis imported but unused.-from functools import lru_cache🧰 Tools
🪛 Ruff (0.8.2)
4-4:
functools.lru_cacheimported but unusedRemove unused import:
functools.lru_cache(F401)
7-7: Remove unused import.
LLMConfigis imported but unused.-from cognee.infrastructure.llm.config import LLMConfig, get_llm_config +from cognee.infrastructure.llm.config import get_llm_config🧰 Tools
🪛 Ruff (0.8.2)
7-7:
cognee.infrastructure.llm.config.LLMConfigimported but unusedRemove unused import:
cognee.infrastructure.llm.config.LLMConfig(F401)
10-10: Remove unused import.
LiteLLMEmbeddingEngineis imported but never referenced.-from cognee.infrastructure.databases.vector.embeddings.LiteLLMEmbeddingEngine import ( - LiteLLMEmbeddingEngine, -)🧰 Tools
🪛 Ruff (0.8.2)
10-10:
cognee.infrastructure.databases.vector.embeddings.LiteLLMEmbeddingEngine.LiteLLMEmbeddingEngineimported but unusedRemove unused import:
cognee.infrastructure.databases.vector.embeddings.LiteLLMEmbeddingEngine.LiteLLMEmbeddingEngine(F401)
28-32: Prefer pytest fixtures to manage environment variables.Manually setting and popping environment variables could cause side effects if tests run in parallel. Using a pytest fixture to wrap environment changes can avoid shared state issues.
66-67: Optionally differentiate rate-limited vs. other errors.Catching all exceptions as rate-limited might obscure genuine errors. Logging or re-raising non-rate-limit exceptions could improve debugging.
cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py (2)
28-28: Remove or leverage theuserparameter.
useris declared but never used in these functions. Either remove it if unnecessary or incorporate it into task creation logic if it will be used in future expansions.Also applies to: 51-52
36-47: Ensure consistent naming for large batch tasks.
graph_taskandadd_data_points_taskeach use{"batch_size": 10}. Consider extracting this into a shared constant or clarifying it via a docstring to maintain consistency across all tasks.cognee/modules/pipelines/operations/run_tasks_base.py (1)
1-1: Remove unusedinspectimport.The
inspectmodule is never referenced in this file, so consider removing it to maintain a clean import list and avoid confusion.- import inspect from cognee.shared.logging_utils import get_logger🧰 Tools
🪛 Ruff (0.8.2)
1-1:
inspectimported but unusedRemove unused import:
inspect(F401)
cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py (1)
2-4: Remove unused importstimeandlru_cache.Both imports are never referenced. Removing them helps keep the code clean and free of dead references.
- import time - from functools import lru_cache import os🧰 Tools
🪛 Ruff (0.8.2)
2-2:
timeimported but unusedRemove unused import:
time(F401)
4-4:
functools.lru_cacheimported but unusedRemove unused import:
functools.lru_cache(F401)
cognee/tests/test_relational_db_migration.py (9)
29-38: Ensure consistent naming convention in setup function.The function
setup_test_dbis clear, but the returned variable name (migration_engine) and the function name might not be fully aligned semantically. Consider renaming the function or return value to reflect its role more consistently, e.g.,prepare_migration_db.
40-45: Validate schema extraction.
relational_db_migrationextracts the schema and immediately migrates it. Consider verifying that the schema isn't empty before proceeding, to handle unexpected or empty relational database states gracefully.
56-61: Prefer a direct relationship assignment.Instead of branching on
postgresqlvs. "other" forrelationship_label, consider standardizing the label or referencing a dictionary for clarity:labels = { "postgresql": "reports_to", "default": "ReportsTo" } relationship_label = labels.get(migration_db_provider, labels["default"])
65-67: Initialize sets outside the if-block for clarity.
distinct_node_namesandfound_edgesare used in all subsequent blocks. Keeping them initialized up front is clear, but consider explaining their purpose with a short descriptive comment.
108-119: Handle unsupported providers more gracefully.Raising a
ValueErrorhere is fine, but consider providing recovery steps or suggestions, or a clearer error message, e.g.,"Unsupported graph database: {graph_db_provider}. Please specify one of: [neo4j, kuzu, networkx].".🧰 Tools
🪛 Ruff (0.8.2)
111-111: Loop control variable
edge_datanot used within loop bodyRename unused
edge_datato_edge_data(B007)
111-111: Rename the unused loop variable to underscore.According to the static analysis hint (B007), the variable
edge_datais not used. Renaming it to_edge_data(or simply_) clarifies that it is intentionally unused:- for src, tgt, key, edge_data in edges: + for src, tgt, key, _ in edges:🧰 Tools
🪛 Ruff (0.8.2)
111-111: Loop control variable
edge_datanot used within loop bodyRename unused
edge_datato_edge_data(B007)
121-127: Consider documenting node/edge expectations.The assertion for 8 nodes and 7 edges is strongly tied to specific test data. A docstring or inline comment clarifying the test data’s structure (Employee IDs, etc.) would help future maintainers follow the logic.
165-169: Clarify difference in DB sizes.The note comments mention that Postgres and SQLite data differ. Consider referencing the relevant script or steps for generating each dataset so new contributors know why the counts differ.
221-233: Document dependency on external scripts.Here you describe that we must run
Chinook_PostgreSql.sql. Provide a link or instruction for quickly enabling the environment (e.g.,psql -U <user> -f <path>). This helps others replicate the test environment easily.cognee/tests/test_telemetry.py (4)
19-31: Prevent concurrency issues with .anon_id creation.Creating or modifying
.anon_iddirectly in the test might be problematic if multiple tests run in parallel or if the file is present from a previous run. Consider isolating test files in a temporary directory or mocking file I/O to avoid conflicts.
38-44: Be cautious with environment variable manipulation.The test temporarily deletes or sets
ENVandTELEMETRY_DISABLED. This can cause side effects in parallel tests. In a larger suite, prefer context managers or a fixture-based approach to isolate environment variable changes.
86-97: Clean up environment variable settings systematically.It's good that you clean up
TELEMETRY_DISABLED. For consistency and safety, consider using atry/finallyblock or a dedicated fixture to avoid leftover environment variables if an exception is raised mid-test.
99-119: Use environment mocking for dev/test checks.Setting
ENV = "dev"in code might mask local environment settings. A more robust approach is to patchos.environor a config object, verifying that telemetry is off in dev. This helps ensure test isolation.evals/falkor_01042025/hotpot_qa_falkor_graphrag_sdk.py (5)
5-5: Remove unused import.
URLfromgraphrag_sdk.sourceis imported but not used. Remove it to keep the imports clean and avoid confusion.Apply this diff to remove the unused import:
- from graphrag_sdk.source import URL, STRING + from graphrag_sdk.source import STRING🧰 Tools
🪛 Ruff (0.8.2)
5-5:
graphrag_sdk.source.URLimported but unusedRemove unused import:
graphrag_sdk.source.URL(F401)
35-63: Consider partial ontology creation.If
ontology.from_sourcesfails mid-way (e.g., a batch yields errors), the entire run is aborted. As a future enhancement, you might handle partial failure or data skipping to salvage partial ontology generation.
66-119: Implement concurrency checks or locks for graph recreation.If multiple processes or tests attempt to recreate the same graph concurrently, it may lead to inconsistent states. Consider implementing locking or using a safer transaction-based approach, especially for production.
123-159: Provide fallback answers.When an exception occurs in
chat.send_message, you return a generic error response. Consider logging the exception details more thoroughly or providing fallback logic (e.g., partial facts from the knowledge graph) for a more user-friendly experience.
161-199: Validate file existence early.You check
if not os.path.exists(config.ontology_file)to create a new ontology. If the corpus file is also missing or invalid, or if the user inadvertently setsontology_fileincorrectly, the pipeline might fail in a less obvious location. Consider verifying all required paths up front.evals/graphiti_01042025/hotpot_qa_graphiti.py (2)
8-9: Remove the unused 'OpenAI' import.
When the code is not utilized, removing it helps maintain code cleanliness.Use this diff:
from langchain_openai import ChatOpenAI -from openai import OpenAI from tqdm import tqdm🧰 Tools
🪛 Ruff (0.8.2)
9-9:
openai.OpenAIimported but unusedRemove unused import:
openai.OpenAI(F401)
93-95: Consider adding error handling around LLM invocation.
Wrapping the call in a try/except block provides graceful fallbacks or logs in production environments.Here's an example approach:
# Get answer from LLM - response = await llm.ainvoke(messages) - answer = response.content + try: + response = await llm.ainvoke(messages) + answer = response.content + except Exception as e: + logger.error(f"LLM call failed: {str(e)}") + answer = "Error: LLM call failed"cognee/infrastructure/llm/rate_limiter.py (2)
53-57: Remove unused imports.
Ruff flagged that these imports are unused. Removing them simplifies maintenance.-import threading -import logging -import functools -import openai -import os🧰 Tools
🪛 Ruff (0.8.2)
53-53:
threadingimported but unusedRemove unused import:
threading(F401)
54-54:
loggingimported but unusedRemove unused import:
logging(F401)
55-55:
functoolsimported but unusedRemove unused import:
functools(F401)
56-56:
openaiimported but unusedRemove unused import:
openai(F401)
57-57:
osimported but unusedRemove unused import:
os(F401)
133-152: Consider renaming for clarity.
Since the method returnsTruewhen the request is allowed,hit_limitmay be misleading. A more descriptive name can reduce confusion.cognee/infrastructure/llm/embedding_rate_limiter.py (1)
39-49: Docstring mismatch regarding thelimitslibrary.
The docstring mentions using thelimitslibrary, but the implementation appears manual. Consider updating the docstring or using a consistent approach.cognee/tests/unit/infrastructure/test_rate_limiting_retry.py (2)
3-5: Remove unused or duplicate imports.The
patch,MagicMock, andlru_cacheimports are never used, andosis imported twice (lines 3 and 381). Consider removing the unused imports and consolidating theosimport at the top.Apply this diff to remove and consolidate imports:
-import os -import time -import unittest.mock as patch, MagicMock -import functools.lru_cache as lru_cache +import time +import os ... -import osAlso applies to: 381-381
🧰 Tools
🪛 Ruff (0.8.2)
3-3:
osimported but unusedRemove unused import:
os(F401)
4-4:
unittest.mock.patchimported but unusedRemove unused import
(F401)
4-4:
unittest.mock.MagicMockimported but unusedRemove unused import
(F401)
5-5:
functools.lru_cacheimported but unusedRemove unused import:
functools.lru_cache(F401)
56-66: Prefer using testing framework output over print statements.While printing test status can be helpful, relying on standard unittest or pytest assertions and reporting provides more structured test outputs and better integration with CI/CD.
evals/mem0_01042025/hotpot_qa_mem0.py (1)
28-28: Rename the unused underscore variable.You are not using the loop index
iin this block. Renaming it to_or_doc_idxclarifies that it's unused.-for i, document in enumerate(tqdm(corpus, desc="Adding documents")): +for _, document in enumerate(tqdm(corpus, desc="Adding documents")):🧰 Tools
🪛 Ruff (0.8.2)
28-28: Loop control variable
inot used within loop bodyRename unused
ito_i(B007)
evals/plot_metrics.py (1)
210-210: Use dictionary iteration directly instead of calling.keys().Python allows iterating directly over dictionaries without explicitly calling
.keys(). This improves readability and aligns with best practices.-for system in all_systems_metrics.keys(): +for system in all_systems_metrics:🧰 Tools
🪛 Ruff (0.8.2)
210-210: Use
key in dictinstead ofkey in dict.keys()Remove
.keys()(SIM118)
🛑 Comments failed to post (5)
.github/workflows/disable_independent_workflows.sh (1)
4-4:
⚠️ Potential issueAdd error handling after
cdcommand.To prevent accidental destructive operations if the directory does not exist, append
|| exit 1after thecdcommand as recommended by Shellcheck.-cd "$(dirname "$0")" +cd "$(dirname "$0")" || exit 1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.cd "$(dirname "$0")" || exit 1🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 4-4: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
cognee/tests/unit/entity_extraction/regex_entity_extraction_test.py (1)
84-97:
⚠️ Potential issueTemporarily disabled test needs follow-up action
The test for money entity extraction has been commented out and replaced with a simple
passstatement. While the TODO comment clearly explains that the regex is failing and needs to be fixed, it would be better to:
- Add a more specific tracking mechanism (like a JIRA ticket number) to ensure this doesn't get forgotten
- Consider using pytest's built-in skip or xfail mechanisms instead of commenting out the test logic
Consider replacing the current implementation with a proper pytest skip:
@pytest.mark.asyncio async def test_extract_money(regex_extractor): """Test extraction of monetary amounts.""" - # TODO: Lazar to fix regex for test, it's failing currently - pass - # text = "The product costs $1,299.99 or €1.045,00 depending on your region." - # entities = await regex_extractor.extract_entities(text) - - # Filter only MONEY entities - # money_entities = [e for e in entities if e.is_a.name == "MONEY"] - - # assert len(money_entities) == 2 - # assert "$1,299.99" in [e.name for e in money_entities] - # assert "€1.045,00" in [e.name for e in money_entities] + pytest.skip("TODO: Lazar to fix regex for test, it's failing currently") + text = "The product costs $1,299.99 or €1.045,00 depending on your region." + entities = await regex_extractor.extract_entities(text) + + # Filter only MONEY entities + money_entities = [e for e in entities if e.is_a.name == "MONEY"] + + assert len(money_entities) == 2 + assert "$1,299.99" in [e.name for e in money_entities] + assert "€1.045,00" in [e.name for e in money_entities]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.@pytest.mark.asyncio async def test_extract_money(regex_extractor): """Test extraction of monetary amounts.""" pytest.skip("TODO: Lazar to fix regex for test, it's failing currently") text = "The product costs $1,299.99 or €1.045,00 depending on your region." entities = await regex_extractor.extract_entities(text) # Filter only MONEY entities money_entities = [e for e in entities if e.is_a.name == "MONEY"] assert len(money_entities) == 2 assert "$1,299.99" in [e.name for e in money_entities] assert "€1.045,00" in [e.name for e in money_entities]cognee/shared/logging_utils.py (1)
216-217:
⚠️ Potential issueSyntax error in attribute check
There's a syntax error in the attribute check. The
__name__is passed as a direct reference rather than as a string.- if hasattr(exc_type, __name__): + if hasattr(exc_type, "__name__"):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.if hasattr(exc_type, "__name__"): event_dict["exception_type"] = exc_type.__name__cognee/tests/test_cognee_server_start.py (1)
29-30: 🛠️ Refactor suggestion
Replace fixed sleep with polling for better test reliability.
Using a fixed 20-second sleep can make tests brittle and slow - too short for some environments and unnecessarily long for others.
Consider implementing a polling approach that checks if the server is ready:
- # Give the server some time to start - time.sleep(20) + # Poll until server is ready or timeout occurs + max_wait = 30 # Maximum wait time in seconds + start_time = time.time() + server_ready = False + + while not server_ready and time.time() - start_time < max_wait: + try: + response = requests.get("http://localhost:8000/health", timeout=1) + if response.status_code == 200: + server_ready = True + break + except requests.RequestException: + pass + time.sleep(0.5) + + if not server_ready: + stderr = cls.server_process.stderr.read().decode("utf-8") + print(f"Server failed to start within {max_wait} seconds: {stderr}", file=sys.stderr) + raise TimeoutError(f"Server failed to start within {max_wait} seconds")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Poll until server is ready or timeout occurs max_wait = 30 # Maximum wait time in seconds start_time = time.time() server_ready = False while not server_ready and time.time() - start_time < max_wait: try: response = requests.get("http://localhost:8000/health", timeout=1) if response.status_code == 200: server_ready = True break except requests.RequestException: pass time.sleep(0.5) if not server_ready: stderr = cls.server_process.stderr.read().decode("utf-8") print(f"Server failed to start within {max_wait} seconds: {stderr}", file=sys.stderr) raise TimeoutError(f"Server failed to start within {max_wait} seconds")cognee/modules/pipelines/tasks/task.py (1)
20-45: 🛠️ Refactor suggestion
Add validation for
task_configand_next_batch_size.Although the constructor sets default values, there's no check to ensure
task_config["batch_size"]is a positive integer. A guard for negative or zero values could help prevent faulty usage and runtime errors.You can extend the constructor to validate
batch_size, for example:if task_config is not None: self.task_config = task_config + if "batch_size" in task_config: + if not isinstance(task_config["batch_size"], int) or task_config["batch_size"] < 1: + raise ValueError("task_config['batch_size'] must be a positive integer")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def __init__(self, executable, *args, task_config=None, **kwargs): self.executable = executable self.default_params = {"args": args, "kwargs": kwargs} if inspect.isasyncgenfunction(executable): self.task_type = "Async Generator" self._execute_method = self.execute_async_generator elif inspect.isgeneratorfunction(executable): self.task_type = "Generator" self._execute_method = self.execute_generator elif inspect.iscoroutinefunction(executable): self.task_type = "Coroutine" self._execute_method = self.execute_coroutine elif inspect.isfunction(executable): self.task_type = "Function" self._execute_method = self.execute_function else: raise ValueError(f"Unsupported task type: {executable}") if task_config is not None: self.task_config = task_config if "batch_size" in task_config: if not isinstance(task_config["batch_size"], int) or task_config["batch_size"] < 1: raise ValueError("task_config['batch_size'] must be a positive integer") if "batch_size" not in task_config: self.task_config["batch_size"] = 1 def run(self, *args, **kwargs):
Description
Fix Gemini github action
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.