Skip to content

saanvijay/LLM-nexus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-nexus

Node.js License Commit Activity Last Commit Issues

LLM-Nexus is a lightweight MITM proxy that sits between any AI client and the upstream LLM. It provides:

  • Observability — intercepts every request, logs prompts and completions with accurate BPE token counts, and streams everything to a real-time dashboard
  • Cost reduction — compresses verbose prompts to strip filler tokens, then serves repeated or semantically similar prompts from an in-memory cache, skipping the upstream LLM call entirely
  • Privacy guardrails — redacts PII (emails, API keys, SSNs, credit cards, and more) from every request before it is logged, cached, or forwarded
  • Agent integration — exposes an MCP server so any MCP-compatible AI agent (Claude Desktop, custom agents) can query logs, stats, and cache as tools

Project Structure

config/
├── config.json               # Proxy settings (port, logLevel, redactPII, etc.)
└── pii.config.json           # PII redaction rules — add or disable rules here

backend/
├── proxy/
│   ├── server.js             # Entry point — proxy + dashboard startup
│   ├── handler.js            # Request / response forwarding logic
│   └── certManager.js        # CA + per-host TLS cert generation & cache
├── dashboard/
│   ├── server.js             # HTTP server for dashboard UI + REST API (port 3001)
│   └── store.js              # In-memory log store with SSE broadcast
├── mcp/
│   └── server.js             # MCP stdio server — AI agent tool integration
└── utils/
    ├── logger.js             # Prompt/response extraction and log formatting
    ├── cache.js              # In-memory prompt cache (exact + similarity matching)
    ├── tokenizer.js          # Real BPE token counting via tiktoken
    ├── simpleOps.js          # Simple file-op detection and interception
    └── redactor.js           # PII guardrail — redacts sensitive data before forwarding

frontend/
└── index.html                # Observability dashboard (single-file, no build step)

Request pipeline

LLM-Nexus Request Pipeline

Source: assets/flow-diagram.excalidraw — open in Excalidraw to edit.

Step Action
1 Simple-op check — intercept immediately, return manual instruction (if saveToken: true)
2 Exact cache hit — replay stored response, skip LLM
3 Similar cache hit — replay best matching cached response, skip LLM
4 Upstream LLM call — forward request, cache response, push to dashboard store

Setup

1. Install dependencies

cd backend && npm install

2. Start the proxy

node proxy/server.js

This starts two servers simultaneously:

  • Proxy on http://localhost:3000 — intercepts all LLM traffic
  • Dashboard on http://localhost:3001 — observability UI + REST API

On first run a self-signed CA certificate is generated and saved to backend/certs/. The startup output prints the exact command to trust it.

3. Trust the CA cert (macOS, run once)

sudo security add-trusted-cert -d -r trustRoot \
  -k /Library/Keychains/System.keychain \
  backend/certs/ca.crt

4. Tell Node.js about the CA cert

Add to ~/.zprofile (not ~/.zshrc — GUI apps like VS Code don't read ~/.zshrc):

export NODE_EXTRA_CA_CERTS="/Users/<your-username>/LLM-nexus/backend/certs/ca.crt"

Apply immediately:

launchctl setenv NODE_EXTRA_CA_CERTS "/Users/<your-username>/LLM-nexus/backend/certs/ca.crt"

5. Export proxy env vars

Add to ~/.zprofile:

export HTTP_PROXY=http://localhost:3000
export HTTPS_PROXY=http://localhost:3000

HTTPS_PROXY is required for LLM APIs — all Anthropic and OpenAI traffic is HTTPS. HTTP_PROXY alone will not intercept it.

Apply immediately:

launchctl setenv HTTP_PROXY "http://localhost:3000"
launchctl setenv HTTPS_PROXY "http://localhost:3000"
source ~/.zprofile

6. VS Code setting (catches anything Electron still rejects)

Add to VS Code settings.json:

"http.proxyStrictSSL": false

7. Restart VS Code (Cmd+Q — not just close the window) so Copilot picks up all changes.


Using with CLI tools (Claude CLI, curl, etc.)

GUI apps like VS Code pick up env vars from launchctl. CLI tools launched from a terminal only see what is exported in that shell session.

Before launching any CLI tool you want to intercept, export all three vars in the same terminal:

export HTTP_PROXY=http://localhost:3000
export HTTPS_PROXY=http://localhost:3000
export NODE_EXTRA_CA_CERTS="/Users/<your-username>/LLM-nexus/backend/certs/ca.crt"
claude   # or any other CLI

If you see "SSL certificate verification failed" with a CLI tool, the most common cause is that NODE_EXTRA_CA_CERTS is not set (or points to a stale path). Verify with:

echo $NODE_EXTRA_CA_CERTS

It must point to backend/certs/ca.crt inside this repo. If the path is wrong or empty, set it in the current shell before retrying.


Observability Dashboard

Open http://localhost:3001 in any browser after starting the proxy.

Stats bar

Metric Description
Total Calls All intercepted requests
Total Tokens Cumulative tokens across all LLM calls
Cache Hits Exact + similarity hits served from cache
Avg Latency Mean round-trip time for upstream LLM calls

Filter tabs

  • All — every intercepted event
  • LLM Calls — upstream completions with full prompt/response detail
  • Cache Hits — requests served from cache, including similarity score
  • Simple Ops — file operations intercepted before reaching the LLM

Detail panel

Clicking any entry in the list opens a detail panel showing:

  • Model name, HTTP status, latency
  • Token breakdown cards — System / Input / Output / Total
  • Full system prompt, user input, and LLM output with syntax-highlighted sections

Live feed

The dashboard connects to the proxy via Server-Sent Events and updates in real time without polling or page refresh. The green dot in the header indicates an active SSE connection.


REST API

The dashboard server exposes a REST API on port 3001 that any HTTP client or agent can call.

Method Path Description
GET /api/logs All stored log entries (newest first)
GET /api/logs?type=llm Filter by type: llm, cache_hit, simple_op
GET /api/logs?query=async Full-text search across all log fields
GET /api/logs?limit=20 Limit result count (max 200)
GET /api/stats Aggregate statistics (calls, tokens, cache hits, latency)
GET /api/cache Cache entry count, similarity threshold, key previews
DELETE /api/cache Clear the entire prompt cache
GET /api/config Current proxy configuration
GET /api/stream SSE live feed of new log entries

Parameters can be combined: /api/logs?type=llm&query=async&limit=10


MCP Server (AI Agent Integration)

The MCP (Model Context Protocol) server lets any MCP-compatible AI agent — Claude Desktop, custom agents, or agent frameworks — call this proxy's functions as tools.

Start the MCP server

node backend/mcp/server.js

The MCP server communicates over stdio (standard MCP convention) and talks to the dashboard REST API on localhost:3001. The proxy must be running first.

Available tools

Tool Description
get_logs Retrieve already-processed logs — filterable by type, query, limit
get_stats Aggregate stats over processed requests: calls, tokens, cache hits, avg latency
get_cache_info Cache entry count, similarity threshold, key previews
search_logs Full-text search across already-processed log entries

Connect to Claude Desktop

Add to ~/.claude/claude_desktop_config.json:

{
  "mcpServers": {
    "llm-nexus": {
      "command": "node",
      "args": ["/Users/<your-username>/LLM-nexus/backend/mcp/server.js"]
    }
  }
}

Restart Claude Desktop. The tools will appear automatically under the llm-nexus server.

Connect to any MCP-compatible agent

Any agent that supports the Model Context Protocol can connect by launching the server as a subprocess and communicating via stdin/stdout. The server name is llm-nexus, version 1.0.0.


Features

Token counting

Every intercepted request is tokenised with tiktoken — the same BPE tokeniser used by OpenAI models. Token counts are computed locally from the actual prompt and response text.

The model is read from the request body and the correct encoding is selected automatically:

Model prefix Encoding
gpt-4o o200k_base
gpt-4, gpt-3.5 cl100k_base
text-davinci p50k_base
Unknown cl100k_base (fallback)

Prompt cache

Identical and similar prompts are served from an in-memory cache, skipping the upstream LLM call entirely.

Exact match — the full prompt text is the cache key. Same prompt → instant replay.

Similarity match — prompts are tokenised into word sets and compared with Jaccard similarity. Any prompt scoring ≥ 75% against a cached entry is a hit. The threshold is configurable via SIMILARITY_THRESHOLD in backend/utils/cache.js.

Prompt compression

Before a request is forwarded to the upstream LLM (or stored in the cache), the proxy runs the prompt through a multi-pass compressor that reduces token count while preserving meaning. Enabled by default via compressPrompts: true in config.json.

The compressor is implemented in backend/utils/compressor.js and applies 13 rules in order:

Pass Rules Example
Whitespace Trailing spaces, 3+ blank lines, multiple spaces "too many spaces""too many spaces"
Punctuation Repeated !!!, ???, .... "really???""really?"
AI filler Self-introductions, hollow openers "Certainly! I'd be happy to help."""
Verbose connectives Long phrases → short equivalents "In order to""To", "Due to the fact that""Because"
Verbose instructions Wordy imperatives "Please make sure that you""Ensure"
Redundant qualifiers Words that add length without precision "very unique""unique", "basically" → removed
Deduplication Identical adjacent sentences removed Second copy of a repeated instruction dropped

Token savings are computed with tiktoken (same BPE tokeniser used for input/output counts) and logged on every call:

[COMPRESS] 24 tokens saved (28% reduction: 87 → 63)

The dashboard detail panel shows a Compression card with a before/after bar chart for each LLM call, and the stats bar tracks cumulative Tokens Saved across the session.

To disable, set in config.json:

"compressPrompts": false

Simple-op interception

Prompts describing trivial file-system operations are intercepted before reaching the LLM. Enabled only when saveToken: true is set in config (default: false).

Prompt contains Operation
create/add/make a file Create / Add File
delete/remove a file Delete / Remove File
move file/directory Move File / Directory
rename file/directory Rename File / Directory
copy file/directory Copy File / Directory
add a comment Add Comment

Log levels

Set logLevel in config.json or via the LOG_LEVEL environment variable.

Level Behaviour
INFO (default) Only LLM calls — prompts, responses, cache hits, simple ops
DEBUG Everything including raw HTTP traffic, telemetry, REST requests

PII Redaction

When redactPII: true is set in config.json (default), the proxy scrubs Personally Identifiable Information from every request before it is logged, cached, or forwarded upstream. The original value is never stored anywhere.

Rules are defined in config/pii.config.json. Each rule has:

Field Description
name Canonical placeholder label, e.g. EMAIL → replaced with [EMAIL]
aliases Alternative names to reference this rule (e.g. emailAddress, email_address, mail)
description Human-readable explanation of what the rule detects
pattern JSON-escaped regex pattern string
flags Regex flags — g, gi, etc.
enabled Set to false to skip a rule without deleting it

Built-in rules

Name Aliases Detects
API_KEY apiKey, api_key, token, secret OpenAI sk-..., Anthropic sk-ant-..., GitHub ghp_/gho_/ghs_, Bearer tokens
CREDIT_CARD creditCard, credit_card, cardNumber, card_number Visa, Mastercard, Amex, Discover 16-digit numbers
BANK_ACCOUNT bankAccount, bank_account, accountNumber, routingNumber Account/routing numbers preceded by a label
SSN ssn, socialSecurity, social_security, taxId, tax_id US Social Security Number (NNN-NN-NNNN)
PASSPORT passport, passportNumber, passport_number 1-2 uppercase letters + 6-9 digits
EMAIL email, emailAddress, email_address, mail Email addresses
PHONE phone, phoneNumber, phone_number, mobile, cell US and international phone numbers
IP_ADDRESS ipAddress, ip_address, ip, ipv4 Public IPv4 (private ranges excluded)
DATE_OF_BIRTH dateOfBirth, date_of_birth, dob, birthday, birthDate DOB when labelled with dob:, born on, birthday, etc.

Adding a custom rule

Append an entry to pii.config.json:

{
  "name": "EMPLOYEE_ID",
  "aliases": ["employeeId", "employee_id", "empId"],
  "description": "Internal employee ID format EMP-XXXXXX",
  "pattern": "\\bEMP-\\d{6}\\b",
  "flags": "gi",
  "enabled": true
}

Restart the proxy for changes to take effect. No code changes required.


Configuration

config.json

Edit config/config.json for proxy-level settings:

Key Default Env override Description
port 3000 PORT Proxy listen port
host localhost HOST Proxy bind address
requestTimeout 30000 REQUEST_TIMEOUT Upstream timeout (ms)
logLevel "INFO" LOG_LEVEL INFO or DEBUG
saveToken false Enable simple-op interception
redactPII true Enable PII redaction guardrail
compressPrompts true Enable prompt compression before forwarding
upstreamProxy.host null Hostname of the upstream (chained) proxy
upstreamProxy.port null Port of the upstream proxy
upstreamProxy.auth null Basic-auth credentials as "user:password", or null
defaultPorts.http 80 Default HTTP port
defaultPorts.https 443 Default HTTPS port

Dashboard port can be changed via the DASHBOARD_PORT environment variable (default 3001).

pii.config.json

Edit config/pii.config.json to manage PII redaction rules. See the PII Redaction section for the full rule schema and built-in rule list.


Testing

All tests live in backend/tests/ and are fully standalone — no proxy process needs to be running.

Test file What it covers Tests
test-proxy-chain.js Upstream proxy chain (CONNECT tunnel + TLS) 2
test-cache.js In-memory prompt cache 41
test-redactor.js PII redaction — all 9 rules 78
test-compressor.js Prompt compression — all 13 rules 62

Run all:

node backend/tests/test-proxy-chain.js
node backend/tests/test-cache.js
node backend/tests/test-redactor.js
node backend/tests/test-compressor.js

Proxy chain — test-proxy-chain.js

Verifies the full upstream proxy-chain path without touching production config:

  1. Spins up a local mini CONNECT proxy on a random port
  2. Calls openTunnel() directly (same code path as handler.js)
  3. TLS-wraps the raw socket and fires a real HTTPS GET to httpbin.org/get
[mini-proxy] listening on 127.0.0.1:<port>

Test 1: openTunnel() via mini-proxy → httpbin.org:443
  [mini-proxy] CONNECT httpbin.org:443
Test 2: TLS wrap + HTTPS GET https://httpbin.org/get
  status: 200

✓ Test 1 PASSED — mini-proxy received CONNECT tunnel request
✓ Test 2 PASSED — TLS + HTTPS request succeeded through chain

✅ Proxy chain is working.

upstreamProxy in config.json does not need to be enabled.


Prompt cache — test-cache.js

41 assertions across 7 sections:

Section What is tested
getCacheKey All four prompt fields (prompt, messages, inputs, input), wrong content-type, malformed JSON, field priority
set / get / size / clear / keys Basic CRUD, null on miss, key listing
Overwrite Re-inserting same key replaces value and refreshes insertion order
findSimilar — strings Exact score 1.0, near-identical hit (Jaccard ≥ 0.75), unrelated miss, empty cache
findSimilar — messages OpenAI messages[] format flattened correctly for similarity
Best match selection Returns highest-scoring entry when multiple candidates qualify
MAX_SIZE eviction Cache stays ≤ 500 entries; oldest entry evicted; re-insert does not exceed limit

PII redactor — test-redactor.js

78 assertions covering all 9 built-in rules plus buffer-level behaviour:

Section What is tested
Rule loading All rules compiled, getRuleByName by canonical name / alias / case-insensitive
EMAIL Two matches in one string aggregated into one found entry with count=2; no false positives
PHONE US formats matched; short numbers ignored
SSN Valid format matched; 000-xx and 9xx-xx invalid prefixes excluded
CREDIT_CARD Visa, Mastercard, Discover formats
API_KEY OpenAI sk-, Anthropic sk-ant-, GitHub ghp_
BANK_ACCOUNT account: and routing # label variants
PASSPORT 1–2 uppercase letters + 6–9 digits
IP_ADDRESS Public IPs matched; private ranges (10.x, 192.168.x, 127.x) excluded
DATE_OF_BIRTH Four label variants matched; unlabelled dates not redacted
Multi-type EMAIL + SSN + PHONE in one string, all redacted independently
redactBuffer no-ops null, empty, non-JSON content-type, no-PII → original buffer reference returned
redactBuffer messages String content, OpenAI block-content array, image blocks untouched
redactBuffer prompt Plain prompt field redacted; messages + prompt both present
Idempotency Double-redacting a placeholder does not double-wrap it

Prompt compressor — test-compressor.js

62 assertions covering all 13 rules and both compressString / compressBuffer entry points:

Section What is tested
Rule inventory All 13 rules present, every rule has name, fn, and enabled
Whitespace Trailing spaces, 3+ blank lines collapsed, multiple spaces normalised
Punctuation !! / !!!!, ???, .......
AI preamble "As an AI language model," and "As a large language model," removed
Filler openers "Certainly!", "Of course!", "I'd be happy to help", "I hope this helps", "Feel free to ask" removed
Verbose connectives 13 phrase substitutions ("In order to""To", "Due to the fact that""Because", etc.)
Verbose instructions "Please make sure that you" / "Make sure to" / "You must ensure that""Ensure"
Redundant qualifiers "very unique""unique", "absolutely certain""certain", "basically" / "literally" removed
Sentence deduplication Duplicate adjacent sentence removed; unique sentences all retained
Idempotency Compressing an already-compressed string produces the same result
compressBuffer no-ops null, empty, non-JSON content-type, nothing-to-compress → original buffer reference returned
compressBuffer messages String content and OpenAI block-content arrays compressed; image blocks untouched
compressBuffer prompt Plain prompt field compressed
Token savings Verbose system prompt achieves ≥ 10 tokens / ≥ 10% reduction

Troubleshooting

Certificate signature failure

If you see certificate signature failure, the CA cert in the keychain no longer matches the key on disk:

# 1. Remove old certs
rm backend/certs/ca.crt backend/certs/ca.key

# 2. Remove old trusted cert from keychain
sudo security delete-certificate -c "LLM-Nexus Proxy CA" /Library/Keychains/System.keychain

# 3. Restart the server — new CA is generated automatically
node proxy/server.js

# 4. Trust the new CA
sudo security add-trusted-cert -d -r trustRoot \
  -k /Library/Keychains/System.keychain \
  backend/certs/ca.crt

# 5. Re-apply env vars and fully restart VS Code
launchctl setenv NODE_EXTRA_CA_CERTS "/Users/<your-username>/LLM-nexus/backend/certs/ca.crt"

Error reference

Error Fetcher Fix
ERR_CERT_AUTHORITY_INVALID electron-fetch macOS keychain trust (step 3)
fetch failed node-fetch NODE_EXTRA_CA_CERTS in ~/.zprofile + launchctl (step 4)
unable to verify first certificate node-http NODE_EXTRA_CA_CERTS in ~/.zprofile + launchctl (step 4)
certificate signature failure node-http Regenerate certs (see above)

About

A transparent HTTPS proxy between any AI client and upstream LLMs — redacts PII, compresses prompts, caches semantically similar responses, and surfaces everything in a real-time dashboard. Configurable via a single JSON file with upstream proxy chaining and MCP server integration.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages