๐ค A TypeScript simulation framework for testing and running AI agents
SimKit lets you build, test, and run AI agents in your own custom simulated environments. It gives you a simple game loop for running agents step-by-step, supports multiple agents, and includes built-in tools (OTEL) for tracking what happens during your simulations.
SimKit works with any AI agent or LLM, no lock-in. Use your own models and run everything locally. OTEL logs can be saved to a local file or sent to a remote server.
Simulations let you see how your AI agents perform on real world tasks, step by step, in a safe and controlled way.
Traditional evals are great for simple tasks, but they don't give you the full picture. You can't see how your agents handle:
- ๐ฏ Multi-step tasks that need planning and memory
- ๐ ๏ธ Lots of different tools and actions
- ๐ Realistic data and changing situations
- โก Decisions that matter over time
- ๐ Long-term planning and decision-making
- ๐ Processing and reasoning over large amounts of context and information
Surprisingly, most AI agents begin to fail when they are asked to do anything more than a few simple tasks.
SimKit's heart is a simple but powerful tick-based loop:
import { createSimulation, type LoopState } from "@fallom/simkit/simulation";
interface SupportTestState extends LoopState {
totalIssues: number;
resolvedIssues: number;
averageResponseTime: number;
satisfactionScores: number[];
}
const customerIssues = [
"My account is locked and I can't access my files",
"Billing error - charged twice for same month",
"App crashes every time I try to upload",
"Can't find my downloaded files anywhere"
];
const simulation = createSimulation<SupportTestState>({
maxTicks: 10,
initialState: { totalIssues: 0, resolvedIssues: 0, averageResponseTime: 0, satisfactionScores: [] },
onTick: async (state) => {
// Get today's customer issues
const dailyIssues = getRandomIssues(customerIssues, 2);
for (const issue of dailyIssues) {
const startTime = Date.now();
// Test your AI support agent
const agentResponse = await supportAgent.handle(issue);
const responseTime = Date.now() - startTime;
const satisfaction = scoreResponse(agentResponse, issue);
state.totalIssues++;
if (satisfaction > 7) state.resolvedIssues++;
state.satisfactionScores.push(satisfaction);
// Update running averages
const avgSatisfaction = state.satisfactionScores.reduce((a,b) => a+b, 0) / state.satisfactionScores.length;
const resolutionRate = (state.resolvedIssues / state.totalIssues) * 100;
console.log(`Resolution Rate: ${resolutionRate.toFixed(1)}% | Avg Satisfaction: ${avgSatisfaction.toFixed(1)}/10`);
}
return state.tick < 9; // Test for 10 days
},
onEnd: (state) => {
const finalSatisfaction = state.satisfactionScores.reduce((a,b) => a+b, 0) / state.satisfactionScores.length;
console.log(`๐ฏ Final Results: ${((state.resolvedIssues/state.totalIssues)*100).toFixed(1)}% resolution rate, ${finalSatisfaction.toFixed(1)}/10 satisfaction`);
}
});
await simulation.run();
What's happening here? Each tick simulates a day of customer support. SimKit feeds random issues to your AI agent, measures response quality and speed, then tracks KPIs over time. Perfect for A/B testing different models, regression testing after prompt changes, or measuring performance before production deployment.
AI agents need access to simulation state from anywhere:
import { setSimState, getSimState } from "@fallom/simkit/state";
// In your simulation loop
setSimState(state);
// In your AI tools
const currentState = getSimState<MyState>();
Reproduce exact scenarios with seeded randomness - perfect for fair model comparisons:
import { initializeRandom, choice, shuffle } from "@fallom/simkit/random";
// Test Model A
initializeRandom(12345); // Same seed = same test scenarios
const modelA_results = await testSupportAgent(modelA);
// Test Model B with identical scenarios
initializeRandom(12345); // Reset to same seed
const modelB_results = await testSupportAgent(modelB);
// Now you can fairly compare: both models faced the exact same issues
console.log(`Model A: ${modelA_results.satisfaction}/10`);
console.log(`Model B: ${modelB_results.satisfaction}/10`);
Why this matters: Without seeded randomness, Model A might get easy customer issues while Model B gets hard ones, making comparison meaningless. SimKit ensures every model faces identical test scenarios.
Built-in observability for AI agent debugging with zero vendor lock-in:
import { trace } from "@opentelemetry/api";
// SimKit automatically captures spans for you
const tracer = trace.getTracer("my-simulation");
const span = tracer.startSpan("agent-decision");
span.setAttributes({
"agent.action": "support_response",
"simulation.tick": state.tick,
"response.satisfaction": 8.5
});
span.end();
Send telemetry anywhere: Export to your own servers, store in local files, or pipe to any OpenTelemetry-compatible service. No vendor lock-in - you own your data.
Feature | Why It Matters for AI |
---|---|
๐ Tick-Based Loop | Step-by-step agent execution with full control |
๐ OpenTelemetry | Track agent decisions and debug complex behaviors |
๐ฒ Seeded Random | Reproduce exact scenarios for testing and validation |
๐๏ธ Global State | AI tools can access simulation state from anywhere |
๐ง TypeScript | Full type safety for complex agent interactions |
โก Bun Optimized | Fast execution for compute-intensive agent simulations |
npm install @fallom/simkit
# or
bun add @fallom/simkit
Simple agent making strategic decisions
cd apps/examples/energy-ai
bun install && bun run start
A straightforward example showing:
- AI agent with tool calling
- Basic state management
- OpenTelemetry integration
Complex multi-agent economic simulation
A comprehensive example demonstrating SimKit's full capabilities:
- Multi-agent system - Shop owner + customer agents
- Complex state management - Inventory, trades, conversations
- Deterministic scenarios - Seeded randomness for testing
- Rich telemetry - Custom spans and detailed logging
- Tool ecosystem - AI agents with 10+ specialized tools
Perfect for understanding how to build production-grade agent simulations.
Traditional Approach | With SimKit |
---|---|
โ Manual loop management | โ Built-in tick-based execution |
โ No observability | โ OpenTelemetry integration |
โ Non-deterministic testing | โ Seeded randomness |
โ Complex state sharing | โ Global state management |
โ Manual telemetry setup | โ Automatic span collection |
- ๐ฆ Core Package Docs - Full API reference
- ๐ Energy AI Tutorial - Simple getting started guide
- ๐ Pawn Shop Deep Dive - Advanced multi-agent patterns
# Install dependencies
bun install
# Build all packages
bun run build
# Format code
bun run format