aibench - OpenAI API Benchmarking Tool

A Go CLI tool for benchmarking OpenAI-compatible API servers. Automatically discovers available models and measures comprehensive performance metrics including requests per second, tokens per second, and response latency.

Recent Benchmark Results

Qwen Coder Models Comparison (30-second tests)

Provider	Model	Tokens/sec	Success Rate	Avg Latency	Notes
Cerebras	qwen-3-coder-480b	91.93	26.8%	996ms	Highest throughput but rate limited
Alibaba Cloud	qwen3-coder-plus	38.83	88.9%	3.5s	Best reliability
Custom Server	qwen3-coder	33.88	87.5%	4.2s	Most consistent

Winner: Cerebras for raw speed, Alibaba Cloud for production reliability

Key Findings

Cerebtras API delivers exceptional performance (91.93 tokens/sec) but has aggressive rate limiting (~73% error rate)
Alibaba Cloud offers the best balance for production use with 38.83 tokens/sec and 88.9% success rate
Custom server provides predictable performance ideal for development/testing

Benchmarked on 2025-08-01 with identical prompts and conditions

Features

Auto-discovery: Automatically discovers available models via /v1/models endpoint
Comprehensive metrics: RPS, tokens/sec, latency percentiles, TTFT, error rates
Concurrent testing: Auto-detects optimal concurrency or use custom values
Multiple prompt sizes: Test with small, medium, large, or mixed prompts
Real-time progress: Live progress bar with dynamic tokens/sec and smart status updates
Streaming support: Test both regular and streaming completions
Rich output: Colored terminal output and JSON export options
Warmup phase: Optional warmup to stabilize server performance
API Key support: Built-in authentication for cloud providers
Smart URL handling: Automatic HTTPS for domains, HTTP for local IPs

Installation

# Clone and build
git clone <repo-url>
cd aibench
go build -o aibench

# Or install directly
go install

Usage

Basic Usage

# Test a server with default settings (30s duration)
./aibench 86.38.182.74:8000

# Test specific server with custom duration
./aibench --duration 60s localhost:8000

# Test with specific concurrency
./aibench --concurrency 10 --duration 45s api.server.com:8080

Advanced Options

# Test specific models only
./aibench --models gpt-4,gpt-3.5-turbo server:8000

# Force concurrency 1 (optimal for vLLM/resource-limited servers)
./aibench --concurrency -1 server:8000

# Test with API key authentication
./aibench --api-key "your-key-here" api.provider.com

# Test streaming responses
./aibench --streaming --duration 30s server:8000  

# Test with large prompts and longer timeout
./aibench --prompt-size large --timeout 60s server:8000

# Export results as JSON
./aibench --format json server:8000 > results.json

# Disable colors and warmup
./aibench --no-color --warmup 0 server:8000

Cloud Provider Examples

# Alibaba DashScope (Qwen models)
./aibench --api-key "sk-your-key" --models qwen3-coder-plus dashscope-intl.aliyuncs.com/compatible-mode

# OpenAI API
./aibench --api-key "sk-your-key" --models gpt-4 api.openai.com

# Any OpenAI-compatible provider
./aibench --api-key "your-key" your-provider.com/v1

Command Line Options

Flag	Default	Description
`-d, --duration`	`30s`	How long to run the benchmark
`-c, --concurrency`	`0` (auto)	Concurrent requests (0=auto, -1=force 1)
`-m, --models`	all discovered	Specific models to test
`-k, --api-key`	`""`	API key (or use OPENAI_API_KEY env var)
`-t, --timeout`	`30s`	Request timeout
`-w, --warmup`	`5s`	Warmup duration (0 to disable)
`--streaming`	`false`	Test streaming responses
`--prompt-size`	`medium`	Prompt size: small\|medium\|large\|all
`--skip-discovery`	`false`	Skip model discovery, use specified models
`-f, --format`	`table`	Output format: table\|json
`-v, --verbose`	`false`	Verbose output
`--no-color`	`false`	Disable colored output

Real-Time Display

The benchmark shows live performance metrics while running:

[████████████████████] 67.3% | RPS: 45.2 | TPS: 1.2k | Reqs: 1356/1400 | ETA: 9s

Display Features:

Adaptive status: Shows different info based on benchmark phase
- Waiting for first response... - When no requests completed yet
- Warming up... - During initial 2 seconds
- RPS: X | TPS: Y - Live performance metrics
Color-coded metrics: Green (good), yellow (medium), red (low/slow)
Recent vs average: Shows instantaneous speed after warmup
Smart progress bar: Visual completion percentage with ETA

Output Metrics

Performance Metrics

Requests/sec: Successful requests per second
Tokens/sec: Total tokens processed per second
Input/Output Tokens/sec: Breakdown by token type

Latency Metrics

Average/Min/Max Latency: Response time statistics
P95/P99 Latency: 95th and 99th percentile latencies
TTFT: Time to first token (streaming only)

Reliability Metrics

Success Rate: Percentage of successful requests
Error Rate: Percentage of failed requests
Error Breakdown: Detailed error categorization

Example Output

aibench - OpenAI API Benchmarking Tool
Server: localhost:8000
Models: gpt-4, gpt-3.5-turbo
────────────────────────────────────────────────────────────

Benchmarking model: gpt-4
Finding optimal concurrency...
Optimal concurrency: 8
Warming up (5s)...
Running benchmark (concurrency: 8, duration: 30s)...

[████████████████████] 100.0% | RPS: 45.2 | TPS: 1.2k | Reqs: 1356/1356 | ETA: 0s

Benchmark Results

Model: gpt-4
──────────────────────────────────────────────────
Duration:            30.0s
Total Requests:      1356
Successful:          1356 (100.0% success rate)

Requests/sec:        45.2
Tokens/sec:          1.2k
Input Tokens/sec:    890.5
Output Tokens/sec:   312.1

Avg Latency:         177ms
Min Latency:         89ms  
Max Latency:         945ms
P95 Latency:         298ms
P99 Latency:         567ms
Avg TTFT:            45ms

API Compatibility

This tool works with any OpenAI-compatible API server including:

OpenAI API
Local LLM servers (Ollama, LM Studio, etc.)
Cloud providers (Azure OpenAI, AWS Bedrock, Alibaba DashScope, etc.)
Custom inference servers

Required Endpoints:

POST /v1/chat/completions - Chat completions (required)
GET /v1/models - Model discovery (optional)

Model Discovery Issues

Important: Some providers have incomplete or missing model discovery endpoints. Common issues:

Missing Models: The /v1/models endpoint may not list all available models
Authentication Issues: Discovery may fail even when the completion endpoint works
Provider Limitations: Some APIs don't implement the models endpoint

Solutions:

Specify models directly: Use --models model-name to bypass discovery
Skip discovery: Use --skip-discovery when you know the model exists
Check provider docs: Consult your API provider's documentation for correct model names

Automatic Discovery Skipping

Smart Behavior: When you specify models with --models, discovery is automatically skipped. This eliminates the need for a separate flag and handles provider limitations seamlessly.

# This may fail if discovery is broken
./aibench --api-key "key" provider.com

# This automatically skips discovery (recommended)
./aibench --api-key "key" --models qwen3-coder-plus provider.com

# Multiple models also skip discovery
./aibench --api-key "key" --models model1,model2,model3 provider.com

# Only uses discovery when no models specified
./aibench --api-key "key" provider.com  # Will discover all available models

Best Practice: Always specify model names when working with cloud providers to avoid discovery issues and get faster startup times.

Architecture

Client: HTTP client with connection pooling for API interactions
Benchmarker: Coordinates the entire benchmark process
Metrics: Thread-safe performance tracking and statistical calculations
Display: Real-time progress and formatted result output
Workers: Concurrent request generators with different prompt strategies

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
LICENSE		LICENSE
README.md		README.md
benchmarker.go		benchmarker.go
client.go		client.go
display.go		display.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
metrics.go		metrics.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aibench - OpenAI API Benchmarking Tool

Recent Benchmark Results

Qwen Coder Models Comparison (30-second tests)

Key Findings

Features

Installation

Usage

Basic Usage

Advanced Options

Cloud Provider Examples

Command Line Options

Real-Time Display

Display Features:

Output Metrics

Performance Metrics

Latency Metrics

Reliability Metrics

Example Output

API Compatibility

Required Endpoints:

Model Discovery Issues

Automatic Discovery Skipping

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

aibench - OpenAI API Benchmarking Tool

Recent Benchmark Results

Qwen Coder Models Comparison (30-second tests)

Key Findings

Features

Installation

Usage

Basic Usage

Advanced Options

Cloud Provider Examples

Command Line Options

Real-Time Display

Display Features:

Output Metrics

Performance Metrics

Latency Metrics

Reliability Metrics

Example Output

API Compatibility

Required Endpoints:

Model Discovery Issues

Automatic Discovery Skipping

Architecture

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages