Skip to content

Fallomai/simkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

13 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SimKit

npm version License: MIT TypeScript Downloads

๐Ÿค– A TypeScript simulation framework for testing and running AI agents

What is SimKit?

SimKit lets you build, test, and run AI agents in your own custom simulated environments. It gives you a simple game loop for running agents step-by-step, supports multiple agents, and includes built-in tools (OTEL) for tracking what happens during your simulations.

Agent Agnostic & No Vendor Lock-in

SimKit works with any AI agent or LLM, no lock-in. Use your own models and run everything locally. OTEL logs can be saved to a local file or sent to a remote server.

Why Use Simulations?

Simulations let you see how your AI agents perform on real world tasks, step by step, in a safe and controlled way.

Traditional evals are great for simple tasks, but they don't give you the full picture. You can't see how your agents handle:

  • ๐ŸŽฏ Multi-step tasks that need planning and memory
  • ๐Ÿ› ๏ธ Lots of different tools and actions
  • ๐ŸŒ Realistic data and changing situations
  • โšก Decisions that matter over time
  • ๐Ÿ”„ Long-term planning and decision-making
  • ๐Ÿ“š Processing and reasoning over large amounts of context and information

Surprisingly, most AI agents begin to fail when they are asked to do anything more than a few simple tasks.

๐Ÿ”„ Core: The Simulation Loop

SimKit's heart is a simple but powerful tick-based loop:

import { createSimulation, type LoopState } from "@fallom/simkit/simulation";

interface SupportTestState extends LoopState {
  totalIssues: number;
  resolvedIssues: number;
  averageResponseTime: number;
  satisfactionScores: number[];
}

const customerIssues = [
  "My account is locked and I can't access my files",
  "Billing error - charged twice for same month", 
  "App crashes every time I try to upload",
  "Can't find my downloaded files anywhere"
];

const simulation = createSimulation<SupportTestState>({
  maxTicks: 10,
  initialState: { totalIssues: 0, resolvedIssues: 0, averageResponseTime: 0, satisfactionScores: [] },
  
  onTick: async (state) => {
    // Get today's customer issues
    const dailyIssues = getRandomIssues(customerIssues, 2);
    
    for (const issue of dailyIssues) {
      const startTime = Date.now();
      
      // Test your AI support agent
      const agentResponse = await supportAgent.handle(issue);
      
      const responseTime = Date.now() - startTime;
      const satisfaction = scoreResponse(agentResponse, issue);
      
      state.totalIssues++;
      if (satisfaction > 7) state.resolvedIssues++;
      state.satisfactionScores.push(satisfaction);
      
      // Update running averages
      const avgSatisfaction = state.satisfactionScores.reduce((a,b) => a+b, 0) / state.satisfactionScores.length;
      const resolutionRate = (state.resolvedIssues / state.totalIssues) * 100;
      
      console.log(`Resolution Rate: ${resolutionRate.toFixed(1)}% | Avg Satisfaction: ${avgSatisfaction.toFixed(1)}/10`);
    }
    
    return state.tick < 9; // Test for 10 days
  },
  
  onEnd: (state) => {
    const finalSatisfaction = state.satisfactionScores.reduce((a,b) => a+b, 0) / state.satisfactionScores.length;
    console.log(`๐ŸŽฏ Final Results: ${((state.resolvedIssues/state.totalIssues)*100).toFixed(1)}% resolution rate, ${finalSatisfaction.toFixed(1)}/10 satisfaction`);
  }
});

await simulation.run();

What's happening here? Each tick simulates a day of customer support. SimKit feeds random issues to your AI agent, measures response quality and speed, then tracks KPIs over time. Perfect for A/B testing different models, regression testing after prompt changes, or measuring performance before production deployment.

๐Ÿค– Built for AI Agents

Global State Access

AI agents need access to simulation state from anywhere:

import { setSimState, getSimState } from "@fallom/simkit/state";

// In your simulation loop
setSimState(state);

// In your AI tools
const currentState = getSimState<MyState>();

Deterministic Testing

Reproduce exact scenarios with seeded randomness - perfect for fair model comparisons:

import { initializeRandom, choice, shuffle } from "@fallom/simkit/random";

// Test Model A
initializeRandom(12345); // Same seed = same test scenarios
const modelA_results = await testSupportAgent(modelA);

// Test Model B with identical scenarios
initializeRandom(12345); // Reset to same seed
const modelB_results = await testSupportAgent(modelB);

// Now you can fairly compare: both models faced the exact same issues
console.log(`Model A: ${modelA_results.satisfaction}/10`);
console.log(`Model B: ${modelB_results.satisfaction}/10`);

Why this matters: Without seeded randomness, Model A might get easy customer issues while Model B gets hard ones, making comparison meaningless. SimKit ensures every model faces identical test scenarios.

๐Ÿ“Š OpenTelemetry Integration

Built-in observability for AI agent debugging with zero vendor lock-in:

import { trace } from "@opentelemetry/api";

// SimKit automatically captures spans for you
const tracer = trace.getTracer("my-simulation");
const span = tracer.startSpan("agent-decision");
span.setAttributes({
  "agent.action": "support_response",
  "simulation.tick": state.tick,
  "response.satisfaction": 8.5
});
span.end();

Send telemetry anywhere: Export to your own servers, store in local files, or pipe to any OpenTelemetry-compatible service. No vendor lock-in - you own your data.

โœจ Key Features

Feature Why It Matters for AI
๐Ÿ”„ Tick-Based Loop Step-by-step agent execution with full control
๐Ÿ“Š OpenTelemetry Track agent decisions and debug complex behaviors
๐ŸŽฒ Seeded Random Reproduce exact scenarios for testing and validation
๐Ÿ—๏ธ Global State AI tools can access simulation state from anywhere
๐Ÿ”ง TypeScript Full type safety for complex agent interactions
โšก Bun Optimized Fast execution for compute-intensive agent simulations

๐Ÿ“ฆ Installation

npm install @fallom/simkit
# or
bun add @fallom/simkit

๐ŸŽฎ Examples

๐Ÿš€ Getting Started: Energy AI

Simple agent making strategic decisions

cd apps/examples/energy-ai
bun install && bun run start

A straightforward example showing:

  • AI agent with tool calling
  • Basic state management
  • OpenTelemetry integration

๐Ÿ† Advanced: Pawn Shop Simulation

Complex multi-agent economic simulation

A comprehensive example demonstrating SimKit's full capabilities:

  • Multi-agent system - Shop owner + customer agents
  • Complex state management - Inventory, trades, conversations
  • Deterministic scenarios - Seeded randomness for testing
  • Rich telemetry - Custom spans and detailed logging
  • Tool ecosystem - AI agents with 10+ specialized tools

Perfect for understanding how to build production-grade agent simulations.

๐Ÿš€ Why SimKit for AI Development?

Traditional Approach With SimKit
โŒ Manual loop management โœ… Built-in tick-based execution
โŒ No observability โœ… OpenTelemetry integration
โŒ Non-deterministic testing โœ… Seeded randomness
โŒ Complex state sharing โœ… Global state management
โŒ Manual telemetry setup โœ… Automatic span collection

๐Ÿ“– Learn More

๐Ÿ—๏ธ Development

# Install dependencies
bun install

# Build all packages  
bun run build

# Format code
bun run format

NPM Package

๐Ÿš€ Built for the AI simulation community

๐Ÿ“– Documentation โ€ข ๐ŸŽฎ Examples โ€ข ๐Ÿ› Issues

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •