Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions Dockerfile.modal
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
FROM python:3.11-slim

# Set environment variables
ENV PIP_NO_CACHE_DIR=true
ENV PATH="${PATH}:/root/.poetry/bin"
ENV PYTHONPATH=/app
ENV RUN_MODE=modal
ENV SKIP_MIGRATIONS=true

# System dependencies
RUN apt-get update && apt-get install -y \
gcc \
libpq-dev \
git \
curl \
build-essential \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app


ENV PYTHONPATH=/app
WORKDIR /app
COPY pyproject.toml poetry.lock /app/


RUN pip install poetry
RUN pip install python-dotenv
RUN pip install transformers
RUN pip install swebench
RUN pip install sentencepiece
Comment on lines +28 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consolidate dependency management with Poetry

The packages python-dotenv, transformers, swebench, and sentencepiece are being installed separately using pip after running poetry install. To maintain consistency and proper dependency management, it's recommended to add these packages to your pyproject.toml and install them via poetry.

Apply this diff to remove the pip installations:

- RUN pip install python-dotenv
- RUN pip install transformers
- RUN pip install swebench
- RUN pip install sentencepiece

Then add these dependencies to your pyproject.toml and re-run poetry install during the build process.

Committable suggestion skipped: line range outside the PR's diff.

RUN pip install protobuf


COPY cognee/ /app/cognee
COPY evals/ /app/evals
COPY README.md /app/README.md
COPY process_single_repo.py /app/

2 changes: 1 addition & 1 deletion cognee-mcp/cognee_mcp/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ async def handle_call_tool(

search_query = arguments.get("query")

search_results = await cognee.search(SearchType.INSIGHTS, query_text=search_query)
search_results = await cognee.search(SearchType.COMPLETION, query_text=search_query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Incorrect search type usage may cause runtime errors

The change from SearchType.INSIGHTS to SearchType.COMPLETION is problematic because:

  • The retrieved_edges_to_string function expects search results in triplet format (node1, edge, node2), which is provided by SearchType.INSIGHTS
  • SearchType.COMPLETION uses a different task module that may not return results in the expected format
  • Consider using SearchType.GRAPH_COMPLETION if graph-based completion is needed, or keep using SearchType.INSIGHTS for triplet-based results
🔗 Analysis chain

Verify the impact of changing the search type from INSIGHTS to COMPLETION.

The change from SearchType.INSIGHTS to SearchType.COMPLETION could significantly alter the search behavior and results. Please ensure this change:

  1. Aligns with the intended functionality in the Modal-based evaluation system
  2. Maintains backward compatibility with existing clients
  3. Returns the expected result format for the retrieved_edges_to_string function

Let's verify the SearchType usage across the codebase:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check how SearchType is used across the codebase
rg -A 3 "SearchType\." 

# Check for any documentation or tests related to search types
fd -e md -e rst -e txt | xargs rg -l "SearchType"

# Look for any configuration files that might reference search types
fd -e yaml -e json -e toml | xargs rg -l "search.*type"

Length of output: 13984


Script:

#!/bin/bash
# Check the implementation of query_completion and retrieved_edges_to_string
ast-grep --pattern 'async def query_completion($$$) {
  $$$
}'

ast-grep --pattern 'def retrieved_edges_to_string($$$) {
  $$$
}'

# Check for any tests or examples using COMPLETION type
rg -B 2 -A 5 "query_completion|retrieved_edges_to_string"

Length of output: 9698


results = retrieved_edges_to_string(search_results)

Expand Down
35 changes: 0 additions & 35 deletions entrypoint-old.sh

This file was deleted.

188 changes: 188 additions & 0 deletions eval_with_modal.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# File: eval_with_modal.py
from shutil import ignore_patterns

import modal
import os
import json
from pathlib import Path
from typing import Optional
import sys
import dotenv

dotenv.load_dotenv()
app = modal.App("cognee-runner")

# Get the parent directory path
PARENT_DIR = Path(__file__).resolve().parent.parent


MODAL_DOCKERFILE_PATH = Path( "Dockerfile.modal")


# Define ignore patterns
IGNORE_PATTERNS = [
".venv/**/*",
"__pycache__",
"*.pyc",
".git",
".pytest_cache",
"*.egg-info",
"RAW_GIT_REPOS/**/*"
]

# Create image from Modal-specific Dockerfile
image = modal.Image.from_dockerfile(
path=MODAL_DOCKERFILE_PATH,
gpu="T4",
force_build=False,
ignore=IGNORE_PATTERNS
).copy_local_file("pyproject.toml", "pyproject.toml").copy_local_file("poetry.lock", "poetry.lock").env({"ENV": os.getenv('ENV'), "LLM_API_KEY": os.getenv("LLM_API_KEY")}).poetry_install_from_file(poetry_pyproject_toml="pyproject.toml")


@app.function(
image=image,
gpu="T4",
concurrency_limit=5,
timeout=9000000
)
async def run_single_repo(instance_data: dict, disable_cognee: bool = False):
import os
import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate import of json module

The json module is already imported at line 6. The re-import at line 49 is unnecessary and can be removed to avoid redundancy.

Apply this diff to remove the redundant import:

- import json
🧰 Tools
🪛 Ruff (0.8.2)

49-49: Redefinition of unused json from line 6

Remove definition: json

(F811)

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

[error] 49-49: Redefinition of unused json from line 6

from process_single_repo import process_repo # Import the async function directly

# Process the instance
result = await process_repo(instance_data, disable_cognee=disable_cognee)

# Save the result
instance_id = instance_data["instance_id"]
filename = f"pred_{'nocognee' if disable_cognee else 'cognee'}_{instance_id}.json"
path_in_container = os.path.join("/app", filename)

with open(path_in_container, "w") as f:
json.dump(result, f, indent=2)

with open(path_in_container, "r") as f:
content = f.read()

Comment on lines +61 to +66
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for file operations

File operations should include proper error handling to manage potential I/O exceptions.

-    with open(path_in_container, "w") as f:
-        json.dump(result, f, indent=2)
-
-    with open(path_in_container, "r") as f:
-        content = f.read()
+    try:
+        with open(path_in_container, "w") as f:
+            json.dump(result, f, indent=2)
+
+        with open(path_in_container, "r") as f:
+            content = f.read()
+    except IOError as e:
+        print(f"Error handling file {path_in_container}: {e}")
+        raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with open(path_in_container, "w") as f:
json.dump(result, f, indent=2)
with open(path_in_container, "r") as f:
content = f.read()
try:
with open(path_in_container, "w") as f:
json.dump(result, f, indent=2)
with open(path_in_container, "r") as f:
content = f.read()
except IOError as e:
print(f"Error handling file {path_in_container}: {e}")
raise

return (filename, content)
# async def run_single_repo(instance_data: dict, disable_cognee: bool = False):
# import subprocess
# import json
# import os
#
# # Install project dependencies
# subprocess.run(
# ["poetry", "install", "--no-interaction"],
# cwd="/app",
# check=True
# )
#
# instance_json_str = json.dumps(instance_data)
#
# # cmd = [
# # "poetry",
# # "run",
# # "python",
# # "process_single_repo.py",
# # f"--instance_json={instance_json_str}",
# # ]
# # if disable_cognee:
# # cmd.append("--disable-cognee")
# #
# # subprocess.run(cmd, cwd="/app", check=True, env={
# # "ENV": os.getenv('ENV'),
# # "LLM_API_KEY": os.getenv("LLM_API_KEY")
# # })
# venv_python = os.path.join("venv", "bin", "python") # Use "Scripts" instead of "bin" on Windows
#
# cmd = [
# "poetry",
# "run",
# "process_single_repo.py",
# f"--instance_json={instance_json_str}",
# ]
# if disable_cognee:
# cmd.append("--disable-cognee")
#
# subprocess.run(cmd, cwd="/app", check=True, env={
# "ENV": os.getenv('ENV'),
# "LLM_API_KEY": os.getenv("LLM_API_KEY")
# })
# instance_id = instance_data["instance_id"]
# filename = f"pred_{'nocognee' if disable_cognee else 'cognee'}_{instance_id}.json"
# path_in_container = os.path.join("/app", filename)
#
# if os.path.exists(path_in_container):
# with open(path_in_container, "r") as f:
# content = f.read()
# return (filename, content)
# else:
# return (filename, "")

Comment on lines +68 to +121
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove commented out code

Large blocks of commented out code should be removed as they're already tracked in version control. This improves code readability and maintainability.


@app.local_entrypoint()
async def main(disable_cognee: bool = False, num_samples: int = 2):
import subprocess
import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate import of json module

The json module is already imported at line 6. The re-import at line 125 is unnecessary and can be removed to avoid redundancy.

Apply this diff to remove the redundant import:

- import json
🧰 Tools
🪛 Ruff (0.8.2)

125-125: Redefinition of unused json from line 6

Remove definition: json

(F811)

🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter

🪛 GitHub Actions: ruff lint

[error] 125-125: Redefinition of unused json from line 6

import os
def check_install_package(package_name):
"""Check if a pip package is installed and install it if not."""
try:
__import__(package_name)
return True
except ImportError:
try:
subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
return True
except subprocess.CalledProcessError:
return False

for dependency in ["transformers", "sentencepiece", "swebench","python-dotenv"]:
check_install_package(dependency)

from swebench.harness.utils import load_swebench_dataset

print(f"Configuration:")
print(f"• Running in Modal mode")
print(f"• Disable Cognee: {disable_cognee}")
print(f"• Number of samples: {num_samples}")

dataset_name = (
"princeton-nlp/SWE-bench_Lite_bm25_13K" if disable_cognee
else "princeton-nlp/SWE-bench_Lite"
)

swe_dataset = load_swebench_dataset(dataset_name, split="test")
swe_dataset = swe_dataset[:num_samples]

print(f"Processing {num_samples} samples from {dataset_name}")
import pip

# Install required dependencies
pip.main(['install', "pydantic>=2.0.0", "pydantic-settings>=2.0.0"])
Comment on lines +160 to +162
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid installing packages at runtime; manage dependencies via Poetry

Installing packages at runtime using pip.main() is not recommended. Dependencies should be managed through poetry and installed prior to runtime to ensure consistency and avoid potential issues.

Consider adding pydantic>=2.0.0 and pydantic-settings>=2.0.0 to your pyproject.toml and installing them via poetry during the image build process.

🧰 Tools
🪛 GitHub Actions: ruff format

[warning] File requires formatting with Ruff formatter


tasks = [
run_single_repo.remote(instance, disable_cognee=disable_cognee)
for instance in swe_dataset
]
import asyncio
# Run all tasks concurrently
results = await asyncio.gather(*tasks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for concurrent execution

The asyncio.gather() call should include error handling to manage potential failures in concurrent tasks.

-    results = await asyncio.gather(*tasks)
+    try:
+        results = await asyncio.gather(*tasks)
+    except Exception as e:
+        print(f"Error during concurrent execution: {e}")
+        raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
results = await asyncio.gather(*tasks)
try:
results = await asyncio.gather(*tasks)
except Exception as e:
print(f"Error during concurrent execution: {e}")
raise


# Process results
merged = []
for filename, content in results:
if content:
with open(filename, "w") as f:
f.write(content)
print(f"Saved {filename} locally.")
merged.append(json.loads(content))

# Save merged results
merged_filename = "merged_nocognee.json" if disable_cognee else "merged_cognee.json"
with open(merged_filename, "w") as f:
json.dump(merged, f, indent=2)

print(f"Merged {len(merged)} repos into {merged_filename}!")
print("Done!")

14 changes: 14 additions & 0 deletions evals/get_started.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import modal

app = modal.App("example-get-started")


@app.function()
def square(x):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will turn it into a test

print("This code is running on a remote worker!")
return x**2


@app.local_entrypoint()
def main():
print("the square is", square.remote(42))
Loading