Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,11 @@ uv sync --extra debug --extra api --extra postgres --extra weaviate --extra qdra

FROM python:3.12-slim-bookworm

# Install runtime dependencies
RUN apt-get update && apt-get install -y \
libpq5 \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY --from=uv /app /app
Expand Down
214 changes: 214 additions & 0 deletions SETUP_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
# Cognee Local Setup

## Prerequisites

- [Git](https://git-scm.com/)
- [Docker](https://www.docker.com/) / [Colima](https://github.com/abiosoft/colima) (for macOS/Linux container runtime)
- PostgreSQL 14
- OpenAI API Key (or AWS Bedrock credentials if using adapter)

---

## 1. Clone the Repository

```bash
git clone https://github.com/HILabs-Ireland/rules-engine
cd rules-engine
```
## 2. Initialize Submodules

```bash
git submodule update --init
```

## 3. Set Up PostgreSQL

### Install PostgreSQL 14

To install PostgreSQL 14, run:

```bash
brew install postgresql@14
```

### Start PostgreSQL
Start the PostgreSQL service with:

```bash
brew services start postgresql@14
```

### Create the User and Database
Create a PostgreSQL superuser named cognee, set its password, and create the database:

```bash
createuser -s cognee
psql postgres -c "ALTER USER cognee WITH PASSWORD 'cognee'"
createdb -O cognee cognee_db
```

### Install the pgvector Extension
Enable the pgvector extension on the new database:

```bash
psql -d cognee_db -c 'CREATE EXTENSION IF NOT EXISTS vector;'
```

## 4. Obtain OpenAI API Key *(Optional if using Bedrock adapter)*

To use OpenAI models, you'll need an API key. Follow these steps:

1. Go to the [OpenAI Platform](https://platform.openai.com/).
2. Sign up or log in with your email and verify it.
3. Navigate to your [API Keys page](https://platform.openai.com/account/api-keys).
4. Click **+ Create new secret key**.
5. Give it a name (e.g., `cognee`) and copy the key immediately — it won't be shown again.

> ⚠️ The free key includes around $5 of credit. After that, charges will apply.

Keep your key secure — you’ll need it for configuring the `.env` file in the next step.

## 5. Configure Environment Variables

Navigate to the `cognee` directory and copy the example environment file:

```bash
cd cognee
cp .env.template .env
```
Then, open .env in your text editor and update the values as needed. Below is an example configuration:

```env
# Runtime Environment
ENV="local"
DEBUG="true"
TOKENIZERS_PARALLELISM="false"

# Default User Configuration
DEFAULT_USER_EMAIL="[email protected]"
DEFAULT_USER_PASSWORD="your_secure_password"

# LLM Configuration
LLM_API_KEY="<your-openai-api-key>"
LLM_MODEL="openai/gpt-4o-mini"
LLM_PROVIDER="openai"
LLM_ENDPOINT=""
LLM_API_VERSION=""
LLM_MAX_TOKENS="16384"

# Embedding Configuration
EMBEDDING_PROVIDER="openai"
EMBEDDING_API_KEY="<your-openai-api-key>"
EMBEDDING_MODEL="openai/text-embedding-3-large"
EMBEDDING_ENDPOINT=""
EMBEDDING_API_VERSION=""
EMBEDDING_DIMENSIONS=3072
EMBEDDING_MAX_TOKENS=8191

# Vector Database Configuration
VECTOR_DB_PROVIDER="pgvector"

# Database Configuration
DB_PROVIDER="postgres"
DB_NAME=cognee_db
DB_HOST=host.docker.internal
DB_PORT=5432
DB_USERNAME=cognee
DB_PASSWORD=cognee
```
💡 Make sure to replace <your-openai-api-key> with your actual key.
If you're using AWS Bedrock, these values may differ depending on your adapter.

## 6. Start the Service

To start the Cognee service locally, run the following command from the project root:

```bash
docker compose up
```

## 7. Testing Cognee with Insomnia

Once the Docker containers are running, you can test the data ingestion and processing workflow using [Insomnia](https://insomnia.rest/).

---

### 1. Authenticate via the Login Endpoint

1. Open Insomnia.
2. Load the relevant Cognee API collection.
3. Find the `Login` endpoint and send a request.
4. A token will be returned and automatically applied to all future requests.

---

### 2. Generate a Pre-Signed S3 URL

1. Go to the **AWS Console → S3 Dashboard**.
2. Locate the bucket: `devrulesenginestack-workflowsta-databuckete3889a50-40uv9d7bnc5e`.
3. Navigate to: `data/Alternaleaf.md`
4. Click **Object Actions** → **Share with presigned URL**.
5. Set the timeout to approximately 5 minutes.
6. Click **Create** — the URL will be copied to your clipboard.
---

### 3. Send the File Link via the Add Data Endpoint

1. In Insomnia, open the **Add Data** endpoint.
2. Paste the pre-signed S3 URL into the request body.
3. Send the request.

> ✅ Expected response: `200 OK` with a `null` body.

---

### 4. Trigger Data Processing

1. Open the **Cognify** endpoint in Insomnia.
2. Send the request.

> ⚠️ This request might time out — that's expected. The processing continues in the background.

---

### 5. Visualize the Results

1. Open the **Visualise** endpoint in Insomnia.
2. Send the request.


## Troubleshooting: PostgreSQL Connection Errors

If you encounter issues connecting to the PostgreSQL database, you may need to reset or reinitialize the database setup.

1. Stop PostgreSQL

```bash
brew services stop postgresql@14
```

2. Remove Existing Data
⚠️ Warning: This will delete all existing PostgreSQL data for version 14.

```bash
rm -rf /opt/homebrew/var/postgresql@14
```

3. Reinitialize PostgreSQL
```bash
initdb /opt/homebrew/var/postgresql@14 -E UTF-8
```

4. Start PostgreSQL Again
```bash
brew services start postgresql@14
```

5. Recreate User and Database
```bash
createuser -s cognee
psql postgres -c "ALTER USER cognee WITH PASSWORD 'cognee'"
createdb -O cognee cognee_db
psql -d cognee_db -c 'CREATE EXTENSION IF NOT EXISTS vector;'
```

4 changes: 3 additions & 1 deletion cognee/api/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,13 +150,15 @@ def health_check():
"""
Health check endpoint that returns the server status.
"""
return Response(status_code=200)
return JSONResponse(content={"status": "healthy"}, status_code=200)


app.include_router(get_datasets_router(), prefix="/api/v1/datasets", tags=["datasets"])

app.include_router(get_add_router(), prefix="/api/v1/add", tags=["add"])

app.include_router(get_delete_router(), prefix="/api/v1/delete", tags=["delete"])

app.include_router(get_cognify_router(), prefix="/api/v1/cognify", tags=["cognify"])

app.include_router(get_search_router(), prefix="/api/v1/search", tags=["search"])
Expand Down
77 changes: 59 additions & 18 deletions cognee/api/v1/add/routers/get_add_router.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,19 @@ def get_add_router() -> APIRouter:

@router.post("/", response_model=None)
async def add(
data: List[UploadFile],
file: UploadFile = UploadFile(None),
url: str = Form(None),
datasetId: Optional[UUID] = Form(default=None),
datasetName: Optional[str] = Form(default=None),
nodeSets: Optional[List[str]] = Form(default=None),
user: User = Depends(get_authenticated_user),
):
"""This endpoint is responsible for adding data to the graph."""
"""
This endpoint is responsible for adding data to the graph.
Accepts either:
- file: a single file upload
- url: a URL to fetch data from (supports GitHub clone or direct file download)
"""
from cognee.api.v1.add import add as cognee_add

if not datasetId and not datasetName:
Expand All @@ -38,26 +45,60 @@ async def add(
raise ValueError("No dataset found with the provided datasetName.")

try:
if isinstance(data, str) and data.startswith("http"):
if "github" in data:
# Perform git clone if the URL is from GitHub
repo_name = data.split("/")[-1].replace(".git", "")
subprocess.run(["git", "clone", data, f".data/{repo_name}"], check=True)
await cognee_add(
"data://.data/",
f"{repo_name}",
logger.info(f"Received data for datasetId={datasetId}")
if file and file.filename:
logger.info(f"Received file upload: filename={file.filename}, content_type={file.content_type}, datasetId={datasetId}")
try:
text = (await file.read()).decode("utf-8")
logger.info(f"Passing uploaded file as text to cognee_add")
return await cognee_add(
text,
datasetName,
user=user,
node_set=nodeSets
)
except Exception as e:
logger.info(f"Could not decode file as text, falling back to binary. Error: {e}")
file.file.seek(0)
return await cognee_add(
file.file,
datasetName,
user=user,
node_set=nodeSets
)
elif url:
logger.info(f"Received url={url} for datasetId={datasetId}")
if url.startswith("http"):
if "github" in url:
repo_name = url.split("/")[-1].replace(".git", "")
subprocess.run(["git", "clone", url, f".data/{repo_name}"], check=True)
logger.info(f"Cloned GitHub repo to .data/{repo_name}")
return await cognee_add(
"data://.data/",
f"{repo_name}",
user=user,
node_set=nodeSets
)
else:
response = requests.get(url)
response.raise_for_status()
if not response.content:
logger.error(f"No content fetched from URL: {url}")
return JSONResponse(status_code=400, content={"error": "No content fetched from URL"})
logger.info(f"Fetched content from URL: {response.text}")
return await cognee_add(
response.text,
datasetName,
user=user,
node_set=nodeSets
)
else:
# Fetch and store the data from other types of URL using curl
response = requests.get(data)
response.raise_for_status()

file_data = await response.content()

return await cognee_add(file_data)
logger.error(f"Invalid URL format: {url}")
return JSONResponse(status_code=400, content={"error": "Invalid URL format"})
else:
await cognee_add(data, datasetName, user=user)
return JSONResponse(status_code=400, content={"error": "No file or URL provided"})
except Exception as error:
logger.error(f"Error processing file or URL: {error}")
return JSONResponse(status_code=409, content={"error": str(error)})

return router
2 changes: 2 additions & 0 deletions cognee/api/v1/cognify/cognify.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from cognee.tasks.summarization import summarize_text
from cognee.modules.chunking.TextChunker import TextChunker
from cognee.modules.pipelines import cognee_pipeline
from cognee.tasks.metrics.calculate_graph_metrics import calculate_graph_metrics

logger = get_logger("cognify")

Expand Down Expand Up @@ -65,6 +66,7 @@ async def get_default_tasks( # TODO: Find out a better way to do this (Boris's
task_config={"batch_size": 10},
),
Task(add_data_points, task_config={"batch_size": 10}),
Task(calculate_graph_metrics),
]

return default_tasks
Loading