Load balancing library for LLM/AI client SDKs with Resilience4j circuit breakers.
Cobal distributes API requests across multiple LLM endpoints to handle rate limiting transparently. Each endpoint can have its own API key, making Cobal ideal for SAAS platforms where tenants provide their own API keys and need per-tenant load balancer instances.
- Kotlin DSL: Fluent, type-safe DSL for building load balancers
- Multiple Load Balancing Algorithms: Random, Round-Robin, Weighted Random (O(1) via Vose's Alias Method), and Smooth Weighted Round-Robin (Nginx-style)
- Circuit Breaker Integration: Built on Resilience4j for fault tolerance with per-endpoint health tracking
- Transparent Retries: Automatic retry with configurable error handling
- Framework Integrations:
- LangChain4j (
me.ahoo.cobal:langchain4j) - Spring AI (
me.ahoo.cobal:spring-ai)
- LangChain4j (
- Type-safe Error Handling: Hierarchical error model with retriable/non-retriable classification
- Tenant-scoped Instances: Thread-safe registry for per-tenant load balancer caching
core ── Framework-agnostic abstractions (Node, LoadBalancer, algorithms, DSL)
│
├── langchain4j ── LangChain4j integration (Chat, Streaming, Embedding, Image, Audio)
└── spring-ai ── Spring AI integration (Chat, Embedding, Image, Audio)
bom ── Bill of Materials for centralized dependency management
// settings.gradle.kts
dependencyResolutionManagement {
repositories {
mavenCentral()
}
}
// core module
implementation("me.ahoo.cobal:core:0.2.0")
// LangChain4j integration
implementation("me.ahoo.cobal:langchain4j:0.2.0")
// Spring AI integration
implementation("me.ahoo.cobal:spring-ai:0.2.0")
// Or use BOM
implementation(platform("me.ahoo.cobal:bom:0.2.0"))
implementation("me.ahoo.cobal:core")
implementation("me.ahoo.cobal:langchain4j")import me.ahoo.cobal.dsl.loadBalancer
import me.ahoo.cobal.langchain4j.LoadBalancedChatModel
val model1 = OpenAiChatModel.builder().apiKey("key1").build()
val model2 = OpenAiChatModel.builder().apiKey("key2").build()
val balancedModel = LoadBalancedChatModel(
loadBalancer("my-lb") {
weightedRoundRobin()
node("node-1", weight = 5) { model(model1) }
node("node-2", weight = 1) { model(model2) }
}
)
val response = balancedModel.chat(ChatRequest("Hello!"))val lb = loadBalancer<ChatModel>("my-lb") {
roundRobin()
node("node-1") {
model(model1)
circuitBreaker {
failureRateThreshold(50f)
slidingWindowSize(10)
waitDurationInOpenState(Duration.ofSeconds(30))
}
}
node("node-2") { model(model2) }
}| DSL Function | Algorithm | Description |
|---|---|---|
weightedRoundRobin() (default) |
WeightedRoundRobinLoadBalancer |
Smooth weighted round-robin (Nginx-style) |
roundRobin() |
RoundRobinLoadBalancer |
Strict round-robin |
random() |
RandomLoadBalancer |
Uniform random selection |
weightedRandom() |
WeightedRandomLoadBalancer |
Weighted random with O(1) selection |
import me.ahoo.cobal.*
import me.ahoo.cobal.algorithm.RoundRobinLoadBalancer
import me.ahoo.cobal.state.DefaultNodeState
val node1 = DefaultModelNode("node-1", weight = 1, model = model1)
val node2 = DefaultModelNode("node-2", weight = 3, model = model2)
val lb = RoundRobinLoadBalancer("my-lb", listOf(
DefaultNodeState(node1),
DefaultNodeState(node2),
))
val balancedModel = LoadBalancedChatModel(lb)| Algorithm | Description |
|---|---|
RoundRobinLoadBalancer |
Strict round-robin |
RandomLoadBalancer |
Uniform random selection |
WeightedRandomLoadBalancer |
Weighted random with O(1) selection |
WeightedRoundRobinLoadBalancer |
Smooth weighted round-robin (Nginx-style) |
Cobal classifies errors for intelligent retry decisions:
// Retriable errors (will retry on different node)
// - RateLimitError (429)
// - ServerError (5xx)
// - TimeoutError
// - NetworkError
// Non-retriable errors (short-circuit)
// - AuthenticationError (401/403)
// - InvalidRequestError (400)LoadBalancedModel.chat(request)
→ LoadBalancer.execute() {
→ choose() - Filter available nodes & select via algorithm
→ tryAcquirePermission() - Check circuit breaker state
→ model.chat(request)
→ Success → record metrics
→ Failure → convert error → record failure → decide retry?
→ All attempts exhausted → throw AllNodesUnavailableError
}
The Apache Software License, Version 2.0 - Apache License 2.0