Skip to content

AI-Driven Kubernetes Operations for the Polyglot e-Commerce Platform #2

@huseyinbabal

Description

@huseyinbabal

📺 The below issue will be discussed in this webinar as an AI-driven edition of the original workshop scenario.

Containerize and Orchestrate polyglot e-commerce application with Docker, Kubernetes, GitOps, and AI Agents

Description

Transform and operationalize an existing polyglot e-commerce application into a Kubernetes-first platform using Docker, GitOps, and production-oriented platform components. The final system reflects a real implementation outcome: product-service runs on .NET, user-service runs on Spring Boot, and order-service runs on Go, each with its own datastore and deployed through a mix of raw Kubernetes manifests and GitOps-managed resources. In this AI-driven version, remote AI agents assist with manifest authoring, release preparation, implementation tasks, and rollout support while engineers stay in control of decisions and approvals.

Acceptance Criteria

Core Containerization

  • Containerize product-service, user-service, and order-service with production-ready multi-stage Dockerfiles
  • Keep images aligned with each runtime stack: .NET 8, Java/Spring Boot, and Go
  • Push versioned images to a registry suitable for GitOps-based release automation
  • Keep runtime images small, reproducible, and suitable for Kubernetes deployments
  • Document the container build strategy so AI agents and engineers can safely update it over time
  • Maintain basic runtime hygiene such as clear ports, startup commands, and environment-based configuration

Kubernetes Deployment & Management

  • Provide raw Kubernetes manifests under k8s/ for the three services and their backing databases
  • Maintain GitOps-managed application manifests under gitops/apps/base/ for production-style delivery
  • Run product-service with PostgreSQL using CloudNativePG
  • Run user-service with MariaDB using the MariaDB operator
  • Run order-service with MongoDB in the raw Kubernetes setup
  • Keep service, deployment, namespace, and database resources separated clearly by responsibility
  • Enable image update automation for GitOps-managed services through Flux image policies
  • Identify and close remaining GitOps gaps where production parity is incomplete, especially around order-service and RabbitMQ lifecycle management

Service Discovery & Networking

  • Expose services through Kubernetes-native networking instead of an application-level service registry
  • Route traffic directly to services via Ingress and Istio resources rather than a dedicated API gateway layer
  • Use Kubernetes services for internal communication between workloads
  • Use NGINX Ingress in the raw Kubernetes setup
  • Use Istio Gateway and VirtualService resources in the GitOps-managed environment
  • Preserve event-driven communication through RabbitMQ in application integrations
  • Keep service DNS and namespace boundaries explicit and easy to operate
  • Allow AI agents to assist with remote manifest updates for routing and release flows under human review

Data Model Requirements

Kubernetes Manifests Structure

apiVersion: apps/v1
kind: Deployment
metadata:
  name: product-service
  namespace: product-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: product-service
  template:
    metadata:
      labels:
        app: product-service
    spec:
      containers:
      - name: product-service
        image: ghcr.io/org/product-service:latest
        ports:
        - containerPort: 8080
        env:
        - name: CONNECTIONSTRINGS__POSTGRESQL
          valueFrom:
            secretKeyRef:
              name: product-service-secret
              key: connectionString

Spring Boot Configuration

spring:
  application:
    name: user-service
  datasource:
    url: jdbc:mysql://${DB_HOST}:${DB_PORT}/${DB_NAME}
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}
rabbitmq:
  host: ${RABBITMQ_HOST}
  port: ${RABBITMQ_PORT}

Technical Requirements

Container Optimization

  • Multi-stage Builds - Build each service with its own optimized runtime image
  • Layer Caching - Structure Dockerfiles for efficient rebuilds in CI
  • Registry Publishing - Publish images to a registry that Flux can track
  • Polyglot Runtime Support - Support .NET, Java, and Go build pipelines in one repository
  • Config-driven Startup - Keep services configurable through environment variables and secrets
  • AI Agent Assistance - Let AI agents suggest Dockerfile and manifest updates while engineers validate the final changes

Kubernetes Best Practices

  • Dual Delivery Model - Keep both raw k8s/ manifests and GitOps-managed resources understandable and usable
  • Operators - Use CloudNativePG and MariaDB operator resources where the repo already does so
  • Service Mesh Integration - Use Istio Gateway and VirtualService for managed traffic routing
  • Namespace Isolation - Separate workloads by namespace in the GitOps layer
  • Secret Delivery - Use Vault and External Secrets where available instead of hardcoding credentials
  • Image Automation - Use Flux image repository, policy, and update automation for managed services

Configuration Management

  • Environment Variables - Keep application settings externalized
  • Secrets - Use Kubernetes Secrets and ExternalSecret flows for sensitive configuration
  • GitOps Structure - Manage app and infrastructure definitions in separate GitOps directories
  • Remote Change Workflow - Let AI agents prepare manifest and release updates on remote environments under approval gates
  • Consistency - Keep raw manifests, GitOps resources, and running architecture aligned

Security Requirements (to be implemented)

Container Security

  • Base Image Hygiene - Continue improving runtime image minimalism and patch level
  • Image Review - Add scanning and policy checks to the build pipeline
  • Registry Discipline - Use controlled image publishing and tagging
  • Secret Removal from Static YAML - Reduce remaining plain-text credentials in legacy manifests
  • AI-assisted Review - Use AI agents to surface risky config changes before rollout

Kubernetes Security

  • Vault Integration - Expand secure secret delivery patterns already introduced for managed workloads
  • RBAC - Make access boundaries explicit for GitOps, operators, and workloads
  • Namespace Boundaries - Use namespaces consistently to isolate services
  • Ingress and Mesh Hardening - Tighten external exposure and traffic policy over time
  • Policy Controls - Leave room for stronger admission and policy enforcement later

High Availability Requirements (to be implemented)

Resilience Patterns

  • Independent Services - Keep each service deployable and recoverable on its own
  • Message-driven Decoupling - Preserve asynchronous communication with RabbitMQ
  • Safe Rollouts - Use GitOps and controlled image updates to reduce release risk
  • Failure Isolation - Keep database and service boundaries clear across the platform
  • Operational Recovery - Make rollback and re-sync paths straightforward for both engineers and AI-assisted workflows

Infrastructure HA

  • Hetzner K3s Cluster - Run the platform on a multi-node Kubernetes environment
  • Control Plane and Worker Separation - Reflect the infrastructure layout defined under infrastructure/
  • Operator-managed Databases - Use database operators where the managed platform already supports them
  • Managed Traffic Layer - Rely on ingress and service mesh components for resilient routing
  • Future Gaps - Extend HA coverage to remaining unmanaged parts such as RabbitMQ and order-service GitOps delivery

CI/CD Pipeline Requirements (to be implemented)

Build Pipeline

  • GitHub Actions - Build and publish images for all services
  • Container Registry Flow - Store release images in GHCR or an equivalent registry
  • Multi-service Pipeline - Support the polyglot stack in one pipeline model
  • Release Tagging - Keep image versions traceable for GitOps updates
  • Build Consistency - Ensure generated images match the manifests in the repo
  • AI-assisted Delivery Prep - Use AI agents to help prepare release notes, manifest diffs, and rollout changes for review

Deployment Pipeline

  • Flux GitOps - Reconcile Kubernetes state from the repository
  • Image Update Automation - Automatically promote selected image tags into GitOps-managed workloads
  • Environment Structure - Keep cluster and app definitions organized under gitops/clusters/ and gitops/apps/
  • Production Readiness - Align production overlays with the current app base structure
  • Gap Closure - Bring order-service and supporting messaging components into the same GitOps workflow where missing

Monitoring & Observability Requirements

Application Monitoring

  • Prometheus Stack - Collect service and cluster metrics through kube-prometheus-stack
  • Grafana Dashboards - Visualize application and infrastructure health
  • Service Health Visibility - Make service status observable across environments
  • Release Awareness - Track the effect of GitOps-driven releases on runtime behavior
  • AI-assisted Troubleshooting - Allow AI agents to inspect manifests and telemetry context during incident analysis

Logging & Observability

  • Tracing - Use Tempo for distributed tracing support
  • Service Graph Visibility - Use Kiali for mesh-level visibility where applicable
  • Config Traceability - Keep changes traceable through GitOps history
  • Operational Insight - Improve logs and telemetry around service-to-service communication over time

Infrastructure Monitoring

  • Cluster Monitoring - Observe controllers, workloads, and supporting platform components
  • Operator Visibility - Track the health of CNPG, MariaDB operator, External Secrets, Vault, and Istio
  • Resource Tracking - Measure node and workload usage in the Hetzner Kubernetes environment
  • Platform Diagnostics - Keep enough visibility to debug GitOps sync and runtime issues quickly

Incident Management

  • Actionable Signals - Detect failures in services, sync processes, and platform controllers
  • Faster Diagnosis - Use dashboards, traces, and Git history together during troubleshooting
  • Controlled Recovery - Recover by applying GitOps fixes instead of ad hoc cluster edits where possible
  • AI-assisted Operations - Let AI agents help summarize likely causes and proposed manifest fixes without bypassing review

Performance Requirements

  • Container Startup Time Keep service startup practical for iterative deployments
  • Service Response Time Maintain acceptable internal API performance for the three-service workflow
  • Release Propagation Keep GitOps-driven image updates predictable and observable
  • Database Connectivity Ensure each service can reliably reach its own datastore
  • Resource Usage Keep workloads lightweight enough for a cost-conscious Kubernetes setup
  • Operational Overhead Keep remote maintenance manageable for both engineers and AI-assisted delivery flows

Environment Configuration

Development Environment

  • Raw Kubernetes Manifests - Use the k8s/ directory as the simpler starting point
  • Service-level Development - Allow teams to work on each service independently
  • Container-first Workflow - Build and run services in containers before cluster rollout
  • AI-assisted Local Prep - Use AI agents to help generate and review manifest changes before remote application

Staging Environment

  • GitOps Validation - Use Flux-managed structure to validate deployment flow
  • Operator-based Services - Exercise managed database and secret integrations
  • Traffic Verification - Validate Ingress, Istio Gateway, and VirtualService behavior
  • Release Safety - Check image automation before wider rollout

Production Environment

  • Hetzner-based Kubernetes - Use the defined Hetzner cluster layout as the production target
  • Managed Platform Components - Run cert-manager, Istio, Vault, External Secrets, and observability components as shared infrastructure
  • GitOps-first Operations - Prefer repository-driven changes over manual cluster edits
  • AI-assisted Remote Operations - Let AI agents help manage manifests and releases remotely while humans approve the final action

Dependencies

Infrastructure Components

  • Kubernetes / K3s Cluster on Hetzner
  • CloudNativePG for PostgreSQL lifecycle management
  • MariaDB Operator for user-service database management
  • MongoDB for order-service data persistence
  • RabbitMQ for event-driven communication between services

Monitoring Stack

  • kube-prometheus-stack
  • Grafana
  • Tempo
  • Kiali
  • Istio telemetry and platform-level monitoring components

Development Tools

  • Docker for container builds
  • GitHub Actions for build automation
  • Flux for GitOps reconciliation
  • Vault + External Secrets for secret management workflows
  • AI Agent Tooling for remote implementation, manifest updates, and release assistance

Out of Scope (for this workshop)

  • Full platform completion for every workload beyond the implemented repository scope
  • Perfect production parity between legacy k8s/ and GitOps-managed paths
  • Complete RabbitMQ operator rollout if not already represented in repo manifests
  • Advanced policy enforcement such as a full admission-control program
  • Comprehensive test/security automation beyond the current implementation level
  • Replacing human approval with full autonomous operations

Definition of Done

  • product-service, user-service, and order-service are containerized and buildable
  • The repository clearly shows both raw Kubernetes manifests and GitOps-managed deployment assets
  • Product, user, and order workflows are mapped to PostgreSQL, MariaDB/MySQL, and MongoDB respectively
  • Traffic routing works through Ingress and/or Istio resources as defined in the repo
  • Flux image automation is in place for the GitOps-managed services currently supported
  • Shared platform services such as cert-manager, Vault, External Secrets, Istio, and observability components are represented in the infrastructure layer
  • The remaining delivery gaps are explicitly identified, especially around order-service and RabbitMQ management
  • The issue narrative reflects the finished repository as if the scenario has been solved end-to-end
  • The AI-driven angle is clear: agents help manage manifests, releases, and remote implementation work under engineer supervision

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions