Skip to content

StacklokLabs/mockllm

 
 

Repository files navigation

Mock LLM Server

CI PyPI version License

A FastAPI-based mock LLM server that mimics OpenAI and Anthropic API formats. Instead of calling actual language models, it uses predefined responses from a YAML configuration file.

This is made for when you want a deterministic response for testing or development purposes.

Check out the CodeGate project when you're done here!

Features

  • OpenAI and Anthropic compatible API endpoints
  • Streaming support (character-by-character response streaming)
  • Configurable responses via YAML file
  • Hot-reloading of response configurations
  • JSON logging
  • Error handling
  • Mock token counting

Installation

From PyPI

pip install mockllm

From Source

  1. Clone the repository:
git clone https://github.com/stacklok/mockllm.git
cd mockllm
  1. Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate
  1. Install dependencies:
pip install -e ".[dev]"  # Install with development dependencies
# or
pip install -e .         # Install without development dependencies

Usage

  1. Set up the responses.yml
cp example.responses.yml responses.yml
  1. Start the server:
python -m mockllm

Or using uvicorn directly:

uvicorn mockllm.server:app --reload

The server will start on http://localhost:8000

  1. Send requests to the API endpoints:

OpenAI Format

Regular request:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-llm",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ]
  }'

Streaming request:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mock-llm",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ],
    "stream": true
  }'

Anthropic Format

Regular request:

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-sonnet-20240229",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ]
  }'

Streaming request:

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-sonnet-20240229",
    "messages": [
      {"role": "user", "content": "what colour is the sky?"}
    ],
    "stream": true
  }'

Configuration

Response Configuration

Responses are configured in responses.yml. The file has two main sections:

  1. responses: Maps input prompts to predefined responses
  2. defaults: Contains default configurations like the unknown response message

Example responses.yml:

responses:
  "what colour is the sky?": "The sky is blue during a clear day due to a phenomenon called Rayleigh scattering."
  "what is 2+2?": "2+2 equals 9."

defaults:
  unknown_response: "I don't know the answer to that. This is a mock response."

Hot Reloading

The server automatically detects changes to responses.yml and reloads the configuration without requiring a restart.

Development

The project includes a Makefile to help with common development tasks:

# Set up development environment
make setup

# Run all checks (setup, lint, test)
make all

# Run tests
make test

# Format code
make format

# Run all linting and type checking
make lint

# Clean up build artifacts
make clean

# See all available commands
make help

Development Commands

  • make setup: Install all development dependencies
  • make test: Run the test suite
  • make format: Format code with black and isort
  • make lint: Run all code quality checks (format, lint, type)
  • make build: Build the package
  • make clean: Remove build artifacts and cache files
  • make install-dev: Install package with development dependencies

For more details on available commands, run make help.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development

Running Tests

pip install -e ".[dev]"  # Install development dependencies
pytest tests/

Code Quality

# Format code
black .
isort .

# Type checking
mypy src/

# Linting
ruff check .

Error Handling

  • Invalid requests return 400 status codes with descriptive messages
  • Server errors return 500 status codes with error details
  • All errors are logged using JSON format

Logging

The server uses JSON-formatted logging for:

  • Incoming request details
  • Response configuration loading
  • Error messages and stack traces

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

MockLLM, when you want it to do what you tell it to do!

Resources

License

Stars

Watchers

Forks

Packages

No packages published