test(kubevirt): Add gevals-based integration tests for VM toolset

Introduce a comprehensive gevals testing framework to validate VM lifecycle operations including creation with various configurations (basic, Ubuntu, instancetypes, performance, sizing) and troubleshooting scenarios. This enables automated verification of the KubeVirt toolset's functionality and regression prevention. Assisted-By: Claude <[email protected]> Signed-off-by: Lee Yarwood <[email protected]>
containers · lyarwood · Oct 22, 2025 · Oct 22, 2025 · Nov 4, 2025 · Nov 4, 2025
commit d3c5fddb30b20af3e654517ba4177bc7808334b6
diff --git a/.gitignore b/.gitignore
@@ -27,3 +27,6 @@ python/build/
 python/dist/
 python/kubernetes_mcp_server.egg-info/
 !python/kubernetes-mcp-server
+
+.gevals-step*
+gevals-kubevirt-vm-operations-out.json
diff --git a/pkg/toolsets/kubevirt/vm/tests/README.md b/pkg/toolsets/kubevirt/vm/tests/README.md
@@ -0,0 +1,214 @@
+# KubeVirt VM Toolset Tests
+
+This directory contains gevals-based tests for the KubeVirt VM toolset in the Kubernetes MCP Server.
+
+## Overview
+
+These tests validate the VM creation and troubleshooting tools (`vm_create` and `vm_troubleshoot`) by having AI agents complete real tasks using the MCP server.
+
+## Test Structure
+
+```
+tests/
+├── README.md                          # This file
+├── mcp-config.yaml                    # MCP server configuration
+├── claude-code/                       # Claude Code agent configuration
+│   ├── agent.yaml
+│   └── eval.yaml
+└── tasks/                             # Test tasks
+    ├── create-vm-basic/               # Basic VM creation test
+    ├── create-vm-with-instancetype/   # VM with specific instancetype
+    ├── create-vm-with-size/           # VM with size parameter
+    ├── create-vm-ubuntu/              # Ubuntu VM creation
+    ├── create-vm-with-performance/    # VM with performance family
+    └── troubleshoot-vm/               # VM troubleshooting test
+```
+
+## Prerequisites
+
+1. **Kubernetes cluster** with KubeVirt installed
+   - The cluster must have KubeVirt CRDs installed
+   - For testing, you can use a Kind cluster with KubeVirt
+
+2. **Kubernetes MCP Server** running at `http://localhost:8888/mcp`
+
+   ```bash
+   # Build and run the server
+   cd /path/to/kubernetes-mcp-server
+   make build
+   ./kubernetes-mcp-server --port 8888
+   ```
+
+3. **gevals binary** built from the gevals project
+
+   ```bash
+   cd /path/to/gevals
+   go build -o gevals ./cmd/gevals
+   ```
+
+4. **Claude Code** installed and in PATH
+
+   ```bash
+   # Install Claude Code (if not already installed)
+   npm install -g @anthropicsdk/claude-code
+   ```
+
+5. **kubectl** configured to access your cluster
+
+## Running the Tests
+
+### Run All Tests
+
+```bash
+# From the gevals directory
+./gevals eval /path/to/kubernetes-mcp-server/pkg/toolsets/kubevirt/vm/tests/claude-code/eval.yaml
+```
+
+### Run a Specific Test
+
+```bash
+# Run just the basic VM creation test
+./gevals eval /path/to/kubernetes-mcp-server/pkg/toolsets/kubevirt/vm/tests/tasks/create-vm-basic/create-vm-basic.yaml \
+  --agent-file /path/to/kubernetes-mcp-server/pkg/toolsets/kubevirt/vm/tests/claude-code/agent.yaml \
+  --mcp-config-file /path/to/kubernetes-mcp-server/pkg/toolsets/kubevirt/vm/tests/mcp-config.yaml
+```
+
+## Test Descriptions
+
+### create-vm-basic
+
+**Difficulty:** Easy
+**Description:** Tests basic VM creation with default Fedora workload.
+**Key Tool:** `vm_create`
+**Expected Behavior:** Agent should use `vm_create` to generate a plan and then create the VM using `resources_create_or_update`.
+
+### create-vm-with-instancetype
+
+**Difficulty:** Medium
+**Description:** Tests VM creation with a specific instancetype (u1.medium).
+**Key Tool:** `vm_create`
+**Expected Behavior:** Agent should pass the instancetype parameter to `vm_create` and create a VM with the correct instancetype reference.
+
+### create-vm-with-size
+
+**Difficulty:** Medium
+**Description:** Tests VM creation using a size hint ('large').
+**Key Tool:** `vm_create`
+**Expected Behavior:** Agent should use the size parameter which should map to an appropriate instancetype.
+
+### create-vm-ubuntu
+
+**Difficulty:** Easy
+**Description:** Tests VM creation with Ubuntu workload.
+**Key Tool:** `vm_create`
+**Expected Behavior:** Agent should create a VM using the Ubuntu container disk image.
+
+### create-vm-with-performance
+
+**Difficulty:** Medium
+**Description:** Tests VM creation with performance family ('compute-optimized') and size.
+**Key Tool:** `vm_create`
+**Expected Behavior:** Agent should combine performance and size to select an appropriate instancetype (e.g., c1.medium).
+
+### troubleshoot-vm
+
+**Difficulty:** Easy
+**Description:** Tests VM troubleshooting guide generation.
+**Key Tool:** `vm_troubleshoot`
+**Expected Behavior:** Agent should use `vm_troubleshoot` to generate a troubleshooting guide for the VM.
+
+## Assertions
+
+The tests validate:
+
+- **Tool Usage:** Agents must call `vm_create`, `vm_troubleshoot`, or `resources_*` tools
+- **Call Limits:** Between 1 and 30 tool calls (allows for exploration and creation)
+- **Task Success:** Verification scripts confirm VMs are created correctly
+
+## Expected Results
+
+**✅ Pass** means:
+
+- The VM tools are well-designed and discoverable
+- Tool descriptions are clear to AI agents
+- Schemas are properly structured
+- Implementation works correctly
+
+**❌ Fail** indicates:
+
+- Tool descriptions may need improvement
+- Schema complexity issues
+- Missing functionality
+- Implementation bugs
+
+## Output
+
+Results are saved to `gevals-kubevirt-vm-operations-out.json` with:
+
+- Task pass/fail status
+- Assertion results
+- Tool call history
+- Agent interactions
+
+## Customization
+
+### Using Different AI Agents
+
+You can create additional agent configurations (similar to the `claude-code/` directory) for testing with different AI models:
+
+```yaml
+# Example: openai-agent/agent.yaml
+kind: Agent
+metadata:
+  name: "openai-agent"
+commands:
+  argTemplateMcpServer: "{{ .File }}"
+  runPrompt: |-
+    agent-wrapper.sh {{ .McpServerFileArgs }} "{{ .Prompt }}"
+```
+
+### Adding New Tests
+
+To add a new test task:
+
+1. Create a new directory under `tasks/`
+2. Add task YAML file with prompt
+3. Add setup, verify, and cleanup scripts
+4. The test will be automatically discovered by the glob pattern in `eval.yaml`
+
+## Troubleshooting
+
+### Tests Fail to Connect to MCP Server
+
+Ensure the Kubernetes MCP Server is running:
+
+```bash
+curl http://localhost:8888/mcp/health
+```
+
+### VirtualMachine Not Created
+
+Check if KubeVirt is installed:
+
+```bash
+kubectl get crds | grep kubevirt
+kubectl get pods -n kubevirt
+```
+
+### Permission Issues
+
+Ensure your kubeconfig has permissions to:
+
+- Create namespaces
+- Create VirtualMachine resources
+- List instancetypes and preferences
+
+## Contributing
+
+When adding new tests:
+
+- Keep tasks focused on a single capability
+- Make verification scripts robust
+- Document expected behavior
+- Set appropriate difficulty levels
+- Ensure cleanup scripts remove all resources
diff --git a/pkg/toolsets/kubevirt/vm/tests/claude-code/agent.yaml b/pkg/toolsets/kubevirt/vm/tests/claude-code/agent.yaml
@@ -0,0 +1,10 @@
+kind: Agent
+metadata:
+  name: "claude-code"
+commands:
+  useVirtualHome: false
+  argTemplateMcpServer: "--mcp-config {{ .File }}"
+  argTemplateAllowedTools: "mcp__{{ .ServerName }}__{{ .ToolName }}"
+  allowedToolsJoinSeparator: ","
+  runPrompt: |-
+    claude {{ .McpServerFileArgs }} --strict-mcp-config --allowedTools "{{ .AllowedToolArgs }}" --print "{{ .Prompt }}"
diff --git a/pkg/toolsets/kubevirt/vm/tests/claude-code/eval.yaml b/pkg/toolsets/kubevirt/vm/tests/claude-code/eval.yaml
@@ -0,0 +1,14 @@
+kind: Eval
+metadata:
+  name: "kubevirt-vm-operations"
+config:
+  agentFile: agent.yaml
+  mcpConfigFile: ../mcp-config.yaml
+  taskSets:
+    - glob: ../tasks/*/*.yaml
+      assertions:
+        toolsUsed:
+          - server: kubernetes
+            toolPattern: "(vm_create|vm_troubleshoot|resources_.*)"
+        minToolCalls: 1
+        maxToolCalls: 30
diff --git a/pkg/toolsets/kubevirt/vm/tests/mcp-config.yaml b/pkg/toolsets/kubevirt/vm/tests/mcp-config.yaml
@@ -0,0 +1,5 @@
+mcpServers:
+  kubernetes:
+    type: http
+    url: http://localhost:8888/mcp
+    enableAllTools: true
diff --git a/pkg/toolsets/kubevirt/vm/tests/tasks/create-vm-basic/create-vm-basic.yaml b/pkg/toolsets/kubevirt/vm/tests/tasks/create-vm-basic/create-vm-basic.yaml
@@ -0,0 +1,55 @@
+kind: Task
+metadata:
+  name: "create-basic-vm"
+  difficulty: easy
+steps:
+  setup:
+    inline: |-
+      #!/usr/bin/env bash
+      kubectl delete namespace vm-test --ignore-not-found
+      kubectl create namespace vm-test
+  verify:
+    inline: |-
+      #!/usr/bin/env bash
+      # Wait for VirtualMachine to be created
+      if ! kubectl wait --for=jsonpath='{.metadata.name}'=test-vm virtualmachine/test-vm -n vm-test --timeout=30s 2>/dev/null; then
+          echo "VirtualMachine test-vm not found"
+          kubectl get virtualmachines -n vm-test
+          exit 1
+      fi
+
+      echo "VirtualMachine test-vm created successfully"
+
+      # Verify container disk is Fedora (check all volumes)
+      CONTAINER_DISKS=$(kubectl get virtualmachine test-vm -n vm-test -o jsonpath='{.spec.template.spec.volumes[*].containerDisk.image}')
+      if [[ "$CONTAINER_DISKS" =~ fedora ]]; then
+          echo "✓ VirtualMachine uses Fedora container disk"
+      else
+          echo "✗ Expected Fedora container disk, found volumes with images: $CONTAINER_DISKS"
+          kubectl get virtualmachine test-vm -n vm-test -o yaml
+          exit 1
+      fi
+
+      # Verify runStrategy is set (check both spec and status)
+      SPEC_RUN_STRATEGY=$(kubectl get virtualmachine test-vm -n vm-test -o jsonpath='{.spec.runStrategy}')
+      STATUS_RUN_STRATEGY=$(kubectl get virtualmachine test-vm -n vm-test -o jsonpath='{.status.runStrategy}')
+
+      if [[ -n "$SPEC_RUN_STRATEGY" ]]; then
+          echo "✓ VirtualMachine uses runStrategy in spec: $SPEC_RUN_STRATEGY"
+      elif [[ -n "$STATUS_RUN_STRATEGY" ]]; then
+          echo "✓ VirtualMachine has runStrategy in status: $STATUS_RUN_STRATEGY"
+          echo "  Note: VM may have been created with deprecated 'running' field, but runStrategy is set in status"
+      else
+          echo "✗ VirtualMachine missing runStrategy field in both spec and status"
+          exit 1
+      fi
+
+      echo "All validations passed"
+      exit 0
+  cleanup:
+    inline: |-
+      #!/usr/bin/env bash
+      kubectl delete virtualmachine test-vm -n vm-test --ignore-not-found
+      kubectl delete namespace vm-test --ignore-not-found
+  prompt:
+    inline: Please create a Fedora virtual machine named test-vm in the vm-test namespace.
diff --git a/pkg/toolsets/kubevirt/vm/tests/tasks/create-vm-ubuntu/create-vm-ubuntu.yaml b/pkg/toolsets/kubevirt/vm/tests/tasks/create-vm-ubuntu/create-vm-ubuntu.yaml
@@ -0,0 +1,60 @@
+kind: Task
+metadata:
+  name: "create-ubuntu-vm"
+  difficulty: easy
+steps:
+  setup:
+    inline: |-
+      #!/usr/bin/env bash
+      kubectl delete namespace vm-test --ignore-not-found
+      kubectl create namespace vm-test
+  verify:
+    inline: |-
+      #!/usr/bin/env bash
+      # Wait for VirtualMachine to be created
+      if ! kubectl wait --for=jsonpath='{.metadata.name}'=ubuntu-vm virtualmachine/ubuntu-vm -n vm-test --timeout=30s 2>/dev/null; then
+          echo "VirtualMachine ubuntu-vm not found"
+          kubectl get virtualmachines -n vm-test
+          exit 1
+      fi
+
+      echo "VirtualMachine ubuntu-vm created successfully"
+
+      # Verify container disk is Ubuntu (should be quay.io/containerdisks/ubuntu:24.04)
+      CONTAINER_DISKS=$(kubectl get virtualmachine ubuntu-vm -n vm-test -o jsonpath='{.spec.template.spec.volumes[*].containerDisk.image}')
+      if [[ "$CONTAINER_DISKS" =~ ubuntu ]]; then
+          echo "✓ VirtualMachine uses Ubuntu container disk"
+
+          # Verify it's using the specific Ubuntu 24.04 image
+          if [[ "$CONTAINER_DISKS" =~ containerdisks/ubuntu ]]; then
+              echo "✓ Using containerdisks Ubuntu image"
+          fi
+      else
+          echo "✗ Expected Ubuntu container disk, found volumes with images: $CONTAINER_DISKS"
+          kubectl get virtualmachine ubuntu-vm -n vm-test -o yaml
+          exit 1
+      fi
+
+      # Verify runStrategy is set (check both spec and status)
+      SPEC_RUN_STRATEGY=$(kubectl get virtualmachine ubuntu-vm -n vm-test -o jsonpath='{.spec.runStrategy}')
+      STATUS_RUN_STRATEGY=$(kubectl get virtualmachine ubuntu-vm -n vm-test -o jsonpath='{.status.runStrategy}')
+
+      if [[ -n "$SPEC_RUN_STRATEGY" ]]; then
+          echo "✓ VirtualMachine uses runStrategy in spec: $SPEC_RUN_STRATEGY"
+      elif [[ -n "$STATUS_RUN_STRATEGY" ]]; then
+          echo "✓ VirtualMachine has runStrategy in status: $STATUS_RUN_STRATEGY"
+          echo "  Note: VM may have been created with deprecated 'running' field, but runStrategy is set in status"
+      else
+          echo "✗ VirtualMachine missing runStrategy field in both spec and status"
+          exit 1
+      fi
+
+      echo "All validations passed"
+      exit 0
+  cleanup:
+    inline: |-
+      #!/usr/bin/env bash
+      kubectl delete virtualmachine ubuntu-vm -n vm-test --ignore-not-found
+      kubectl delete namespace vm-test --ignore-not-found
+  prompt:
+    inline: Create an Ubuntu virtual machine named ubuntu-vm in the vm-test namespace.