STATUS: ACTIVE DEVELOPMENT / EXPERIMENTAL
This project was created to solve a personal infrastructure challenge: bypassing the API rate limits of Google Gemini for my own R&D in large-scale AI agent systems.
A high-performance, asynchronous proxy server written in Rust. Designed for efficient key rotation to scale API requests.
The core logic is functional, but the latest commits have introduced several bugs that I am currently in the process of fixing. This repository is a snapshot of my live R&D process. It is raw, unpolished, and reflects a work-in-progress.
- As a proof-of-concept for my system architecture skills.
- To demonstrate my ability to rapidly prototype complex, high-performance tools.
- π Smart Key Rotation: Round-robin with health-aware selection
- π‘οΈ Circuit Breaker: Automatic failover protection
- π Health Monitoring: Real-time key performance tracking
- π Rate Limiting: IP-based protection
- π³ Docker Ready: Optimized containers for deployment
- π§ͺ Comprehensive Tests: 226 tests covering core functionality
- π Quick Start
- π§ Configuration
- ποΈ Architecture
- π§ͺ Testing
- π³ Docker Deployment
β οΈ Known Issues
- API Key Rotation: Automatically cycles through multiple Gemini API keys
- Rate Limit Bypass: Distributes requests across keys to avoid quotas
- OpenAI Compatibility: Drop-in replacement for OpenAI API endpoints
- Health Monitoring: Tracks key performance and automatically disables failing keys
- Circuit Breaker: Prevents cascade failures with automatic recovery
- Async Rust: Built on Tokio for high-performance concurrent request handling
- Smart Routing: Health-aware key selection with round-robin fallback
- State Persistence: Optional Redis backend for distributed deployments
- Comprehensive Logging: Structured logging with request tracing
- Docker Optimized: Multi-stage builds with minimal runtime images (~50MB)
This is a high-performance async proxy built with Rust's Tokio runtime. The architecture is designed for scalability and reliability:
main.rs
: Application entry point with graceful shutdown handlingkey_manager.rs
: Smart key rotation with health trackingproxy.rs
: HTTP request forwarding with error handlingcircuit_breaker.rs
: Automatic failover protectionconfig/
: YAML-based configuration with validationhandlers/
: Request processing pipelinestorage/
: Redis and in-memory state persistence
Client β Axum Router β Key Manager β Circuit Breaker β Gemini API
β β
β Response Handler β Error Handler β Health Monitor ββββββ
- Async Processing: Non-blocking I/O for high throughput
- Health Scoring: Real-time key performance metrics (0.0-1.0)
- Automatic Recovery: Failed keys re-enter rotation when healthy
- State Persistence: Survives restarts with Redis backend
- Rust 1.70+: Install from rustup.rs
- Docker (optional): For containerized deployment
- Google Gemini API Keys: Get them from Google AI Studio
# Clone the repository
git clone https://github.com/stranmor/gemini-proxy-key-rotation-rust.git
cd gemini-proxy-key-rotation-rust
# Build the project
make build
# Set up configuration
make setup-config
# Edit config.yaml with your API keys
nano config.yaml
Option 1: Direct Binary
make run
Option 2: Docker (Recommended)
make docker-run
The proxy will start on http://localhost:4806
by default.
Edit config.yaml
with your API keys:
# config.yaml - Minimal setup
server:
port: 4806
groups:
- name: "default"
target_url: "https://generativelanguage.googleapis.com/v1beta/openai/"
api_keys:
- "your-gemini-api-key-1"
- "your-gemini-api-key-2"
- "your-gemini-api-key-3"
server:
port: 4806
admin_token: "your-secure-admin-token" # For admin dashboard
max_tokens_per_request: 125000 # Token limit per request (prevents quota exhaustion)
# Redis for persistence (optional)
redis_url: "redis://localhost:6379"
# Circuit breaker settings
circuit_breaker:
failure_threshold: 5
recovery_timeout_secs: 60
# Rate limiting
max_failures_threshold: 3
temporary_block_minutes: 5
The proxy includes built-in protection against quota exhaustion by validating token counts before forwarding requests:
- Automatic validation: Counts tokens in incoming requests using ML-calibrated tokenizer
- Configurable limits: Set
max_tokens_per_request
in your config - Clear error messages: Returns HTTP 400 with detailed token count information
- Format support: Works with both OpenAI (
messages
) and Gemini (contents
) formats
Example error response for oversized requests:
{
"error": {
"message": "Request body too large: 150000 tokens (max: 125000)",
"type": "validation_error"
}
}
# Health check
curl http://localhost:4806/health
# Test chat completion
curl http://localhost:4806/v1/chat/completions \
-H "Authorization: Bearer dummy-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-1.5-flash-latest",
"messages": [{"role": "user", "content": "Hello!"}]
}'
This method starts a single, persistent, and isolated container for your development work. It will not be affected by other agents or standard make
commands.
-
Start the Container:
- Run the following command. It will build the image and start a container with a unique name and a random, free port on your local machine.
make start-dev
-
Check the Output:
- The script will print the container ID and the exact address (e.g.,
127.0.0.1:49155
) you can use to connect to your personal proxy.
- The script will print the container ID and the exact address (e.g.,
# Start with Docker Compose
make docker-run
# Development mode with hot-reload
make docker-run-dev
# With Redis UI and monitoring tools
make docker-run-with-tools
# Build optimized image
make docker-build
# View logs
make docker-logs
# Stop services
make docker-stop
# Clean up
make docker-clean
# Run comprehensive UAT
make uat
Expected result:
- Docker images build successfully
- Services start and pass health checks
- API endpoints respond correctly
# Run all tests
make test
# Run with coverage
make test-coverage
# Run critical tests only
make test-critical
The project includes comprehensive tests covering:
- Core functionality (key rotation, health monitoring)
- Error handling and recovery
- Security features (rate limiting, authentication)
- Integration scenarios
- Admin Dashboard: Web interface needs UI polish
- Metrics Export: Prometheus integration partially implemented
- Documentation: Some advanced features lack detailed docs
- Error Recovery: Some edge cases in circuit breaker logic
Health Check Failures:
# Check container health
docker compose exec gemini-proxy ls -l /app/busybox
# Verify port availability
netstat -tulpn | grep 4806
Port Conflicts:
- Edit
server.port
inconfig.yaml
- Or set
PORT
environment variable - Restart with
make docker-restart
# Basic health check
curl http://localhost:4806/health
# Detailed health with key validation
curl http://localhost:4806/health/detailed
# Prometheus metrics
curl http://localhost:4806/metrics
All /v1/*
requests are proxied to Gemini API:
/v1/chat/completions
- Chat completions/v1/models
- List available models/v1/embeddings
- Text embeddings
The proxy automatically:
- Selects a healthy API key
- Adds proper authentication headers
- Forwards to Google Gemini API
- Returns the response to client
# Logging level
export RUST_LOG=info # debug, info, warn, error
# Override config file location
export CONFIG_PATH=/path/to/config.yaml
# Redis connection (overrides config.yaml)
export REDIS_URL=redis://localhost:6379
circuit_breaker:
failure_threshold: 5
recovery_timeout_secs: 60
success_threshold: 3
max_failures_threshold: 3
temporary_block_minutes: 5
# View logs
make logs
# Check service status
make status
# Health check
make health
- Throughput: Handles 1000+ RPS on modest hardware
- Memory Usage: ~100MB base memory footprint
- Latency: <10ms proxy overhead
- Key Switching: Sub-millisecond key rotation
The proxy handles Gemini API errors intelligently:
- 400/404: Returns immediately (client error)
- 403: Marks key as invalid, tries next key
- 429: Temporarily disables key, retries with another
- 500/503: Retries with same key, then switches
# Set up development environment
make dev-setup
# Run in development mode
make run-dev
# Run tests
make test
# Code quality checks
make check # Runs lint, format, and tests
Command | Purpose |
---|---|
make build |
Build release binary |
make test |
Run all tests |
make format |
Format code with rustfmt |
make lint |
Run clippy linter |
make docker-build |
Build Docker image |
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Rust and Tokio
- HTTP framework: Axum
- Redis integration: deadpool-redis
- Security: secrecy
- ποΈ Architecture Guide - Detailed system design
- π Monitoring Guide - Observability setup
- π€ Contributing - Development guidelines
This project is licensed under the MIT License - see the LICENSE file for details.
Built with Rust and Tokio for high-performance async processing.
Note: This is an experimental project reflecting active R&D work. The code is functional but may contain rough edges as it evolves.