Distributed LLM Inference & RAG Platform

What this is

A cloud‑native, local MVP platform for production‑style GenAI workloads: low‑latency LLM serving, RAG, orchestration, and full observability. It’s designed as a reusable, open‑source reference stack that teams can adopt instead of building this infrastructure from scratch.

Why we built it

Most teams don’t need a custom GenAI stack; they need a reliable, observable foundation they can plug into real products quickly. This project packages the essential building blocks—serving, retrieval, orchestration, and pipelines—into a single, coherent platform so teams can focus on business logic rather than rebuilding infrastructure.

Who it’s for

Engineering teams building internal LLM applications or enterprise AI services
Platform teams who need a standard, reusable GenAI foundation
OSS users who want a complete, runnable LLM + RAG stack

Why it matters

Performance‑minded serving: vLLM for dynamic batching and KV‑cache optimization
RAG done right: embeddings + vector search with pgvector, plus orchestration
Operational discipline: tracing + metrics via OpenTelemetry, Jaeger, Grafana
Production‑style pipelines: Dagster with retries and logging

Core capabilities

LLM Serving (vLLM): OpenAI‑compatible API; concurrent inference and streaming.
RAG Pipelines (Dagster): Chunk → embed → index into pgvector with retries + logging.
LangGraph Workflow: retrieve → generate → validate with checkpointed state.
Unified Gateway (FastAPI): /chat, /ingest, /health, /metrics.
Observability: end‑to‑end tracing + latency/QPS metrics.

How it’s built

Gateway: FastAPI (gateway/)
Orchestrator: LangGraph (orchestrator/)
Retrieval: FastAPI + pgvector + PostgreSQL (retrieval/)
Inference: vLLM OpenAI‑compatible server (inference/)
Pipelines: Dagster (pipelines/)
Observability: OTel Collector + Jaeger + Prometheus + Grafana (infra/)
Benchmarking: load test script (benchmarks/)

How to run (local)

Start services

docker compose up --build

Ingest a document

curl -X POST http://localhost:8000/ingest -H "Content-Type: application/json" -d '{"documents":[{"id":"doc1","text":"RAG systems combine retrieval and generation."}]}'

Chat with RAG

curl -X POST http://localhost:8000/chat -H "Content-Type: application/json" -d '{"messages":[{"role":"user","content":"What is RAG?"}],"top_k":4}'

Observability (local)

Jaeger: http://localhost:16686
Grafana: http://localhost:3000
Prometheus: http://localhost:9090

Use cases

Internal knowledge assistant for enterprise docs and policies
Support automation with grounded answers and auditability
RAG‑powered search across product, legal, or engineering docs
LLM reasoning workflows that require stepwise validation

Docs

docs/ARCHITECTURE.md — dataflow and component map
docs/DESIGN_DECISIONS.md — tool choices and tradeoffs
docs/OBSERVABILITY.md — tracing + metrics plan
docs/EVALUATION.md — minimal RAG eval guidance
ROADMAP.md — technical milestones (kept outside README)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed LLM Inference & RAG Platform

What this is

Why we built it

Who it’s for

Why it matters

Core capabilities

How it’s built

How to run (local)

Observability (local)

Use cases

Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmarks		benchmarks
docs		docs
gateway		gateway
inference		inference
infra/otel		infra/otel
orchestrator		orchestrator
pipelines		pipelines
retrieval		retrieval
.gitignore		.gitignore
README.md		README.md
ROADMAP.md		ROADMAP.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

Distributed LLM Inference & RAG Platform

What this is

Why we built it

Who it’s for

Why it matters

Core capabilities

How it’s built

How to run (local)

Observability (local)

Use cases

Docs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages