A FastAPI-based mock LLM server that mimics OpenAI and Anthropic API formats. Instead of calling actual language models, it uses predefined responses from a YAML configuration file.
This is made for when you want a deterministic response for testing or development purposes.
Check out the CodeGate project when you're done here!
- OpenAI and Anthropic compatible API endpoints
- Streaming support (character-by-character response streaming)
- Configurable responses via YAML file
- Hot-reloading of response configurations
- JSON logging
- Error handling
- Mock token counting
pip install mockllm
- Clone the repository:
git clone https://github.com/stacklok/mockllm.git
cd mockllm
- Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
- Install dependencies:
pip install -e ".[dev]" # Install with development dependencies
# or
pip install -e . # Install without development dependencies
- Set up the responses.yml
cp example.responses.yml responses.yml
- Start the server:
python -m mockllm
Or using uvicorn directly:
uvicorn mockllm.server:app --reload
The server will start on http://localhost:8000
- Send requests to the API endpoints:
Regular request:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mock-llm",
"messages": [
{"role": "user", "content": "what colour is the sky?"}
]
}'
Streaming request:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mock-llm",
"messages": [
{"role": "user", "content": "what colour is the sky?"}
],
"stream": true
}'
Regular request:
curl -X POST http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-sonnet-20240229",
"messages": [
{"role": "user", "content": "what colour is the sky?"}
]
}'
Streaming request:
curl -X POST http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-sonnet-20240229",
"messages": [
{"role": "user", "content": "what colour is the sky?"}
],
"stream": true
}'
Responses are configured in responses.yml
. The file has two main sections:
responses
: Maps input prompts to predefined responsesdefaults
: Contains default configurations like the unknown response message
Example responses.yml
:
responses:
"what colour is the sky?": "The sky is blue during a clear day due to a phenomenon called Rayleigh scattering."
"what is 2+2?": "2+2 equals 9."
defaults:
unknown_response: "I don't know the answer to that. This is a mock response."
The server automatically detects changes to responses.yml
and reloads the configuration without requiring a restart.
The project includes a Makefile to help with common development tasks:
# Set up development environment
make setup
# Run all checks (setup, lint, test)
make all
# Run tests
make test
# Format code
make format
# Run all linting and type checking
make lint
# Clean up build artifacts
make clean
# See all available commands
make help
make setup
: Install all development dependenciesmake test
: Run the test suitemake format
: Format code with black and isortmake lint
: Run all code quality checks (format, lint, type)make build
: Build the packagemake clean
: Remove build artifacts and cache filesmake install-dev
: Install package with development dependencies
For more details on available commands, run make help
.
pip install -e ".[dev]" # Install development dependencies
pytest tests/
# Format code
black .
isort .
# Type checking
mypy src/
# Linting
ruff check .
- Invalid requests return 400 status codes with descriptive messages
- Server errors return 500 status codes with error details
- All errors are logged using JSON format
The server uses JSON-formatted logging for:
- Incoming request details
- Response configuration loading
- Error messages and stack traces
Contributions are welcome! Please feel free to submit a Pull Request.