Building production-grade distributed systems with automated AWS deployments, achieving sub-5ms response times at 1K+ concurrent users
π Actively seeking: Remote backend engineering positions
π Specialization: Python backend + DevOps automation + Distributed systems
π Location: Dhaka, Bangladesh (Open to worldwide remote)
π¬ Ask me about: FastAPI, System Design, AWS Infrastructure
I don't just write backend codeβI architect complete production systems with full automation from infrastructure to deployment:
β
Infrastructure as Code Expert - Automated AWS deployments managing 11+ EC2 instances with Pulumi & Ansible
β
Performance Engineering - Optimized systems achieving sub-5ms response times with 1K+ concurrent users
β
DevOps Automation - Zero-touch deployments with CI/CD, containerization, and orchestration
β
Distributed Systems - Built fault-tolerant architectures with auto-scaling, load balancing, and high availability
β
Technical Writing - Published articles explaining complex architectures in simple words
π― 6+ Production-Ready Applications Built
β‘ Sub-5ms API Response Times Achieved
π₯οΈ 11+ AWS EC2 Instances Automated
π¦ 1K+ Concurrent Users Supported
π Container Orchestration Systems Designed
π§ͺ 500+ DSA Problems Solved
π 200K+ Technical Blog Readers
π₯ 40+ Educational Videos Created
π Scalable URL Shortener Microservice π₯
High Complexity - Decoupled Microservices Architecture
High-performance URL shortener with three independent services, dual database strategy, and production K3s deployment
- Architected decoupled microservices separating
create_service(write-heavy),redirect_service(95% read traffic), andworker_service(Celery tasks) with independent scaling viadocker-compose-decoupled.yml. - Implemented Redis-first caching with MongoDB fallback and Nginx proxy routing, targeting sub-5ms redirect latency for optimal user experience.
- Built repository pattern with abstract base classes for PostgreSQL and MongoDB, centralized error handling, and shared
common/library for clean data access across services. - Optimized PostgreSQL operations using atomic key acquisition with
SELECT FOR UPDATE SKIP LOCKEDfor race-free distributed key allocation and parameterized bulk inserts for 100K+ keys. - Implemented comprehensive observability using
OpenTelemetrywith B3 propagation, automatic FastAPI/DB instrumentation, OTLP export toTempofor distributed tracing andGrafanavisualization. - Engineered production-grade resilience with
PgBouncerconnection pooling (53% reduction in overhead), circuit breaker pattern preventing cascade failures, exponential backoff retries, and database timeout protection. - Deployed production K3s cluster on AWS using
PulumiIaC andAnsiblewith path-based Nginx routing, per-service rate limiting, and CI/CD pipeline viaGitHub Actions. - Implemented intelligent key pre-population using
Celeryworkers maintaining pool of unused keys for instant URL creation without database latency. - Built comprehensive testing infrastructure with multi-database mocking (SQLite, mongomock, fakeredis), async pytest framework, httpx API client testing, and isolated test environments.
- Automated AWS infrastructure with VPC setup, security groups, bastion host access, and monitoring via
Celery Flowerdashboard.
Technical Deep Dive: Read my Medium articles
Tech Stack: FastAPI Redis PostgreSQL MongoDB Celery Nginx Docker K3s Pulumi Ansible AWS OpenTelemetry Tempo Grafana PgBouncer pytest httpx GitHub Actions
Key Learnings:
- Microservices decoupling for independent scaling
- Repository pattern for clean data access
- Circuit breaker pattern for fault tolerance
- Multi-database testing strategies
- Infrastructure as Code best practices
π ElastiKube: Production K3s Autoscaler π₯
Most Complex Infrastructure Project - ML-Enhanced Event-Driven Architecture
Production-grade autoscaling system for K3s clusters with 4-layer intelligent scaling architecture, ML-based predictive scaling, and multi-AZ high availability
- Architected 4-layer autoscaling system: (1) Data Collection for ML training, (2) Time-Aware Scaling with peak/off-peak thresholds (85%/60% vs 60%/40%), (3) Flash Sale Detection with emergency response to CPU spikes >30% in 2 minutes, (4) Predictive Scaling using Prophet models forecasting CPU 15 minutes ahead.
- Implemented ML training pipeline with Kubernetes CronJob for automated weekly model retraining, feature engineering (temporal cyclical encoding, lag features, rolling statistics), time-series cross-validation, and backtesting with MAE/RMSE metrics tracking.
- Built event-driven Lambda architecture with four specialized functions (Decision, Scale-Up, Scale-Down, Cleanup) orchestrated through EventBridge for fault tolerance, crash recovery via Write-Ahead Log (WAL), and distributed locking with 200s timeout.
- Designed multi-AZ high availability with round-robin worker distribution across 3 availability zones (ap-southeast-1a/b/c), single NAT Gateway optimization, and LIFO scale-down maintaining natural distribution balance.
- Implemented multi-layer idempotency including bootstrap verification, cooldown checks (scale-up: 300s, scale-down: 900s), pending instance detection, and automatic stale flag cleanup to prevent duplicate scaling operations.
- Integrated comprehensive observability with 17 CloudWatch alarms (CRITICAL/WARNING severity), Prometheus health graceful degradation (conservative defaults when unavailable), and fixed LogGroups for stable dashboard references.
- Engineered spot instance support with automatic On-Demand fallback when spot capacity unavailable (InsufficientInstanceCapacity, SpotInstanceCapacityNotAvailable, MaxSpotInstanceCountExceeded), graceful 2-minute interruption handling, and proper node cleanup.
Tech Stack: AWS Lambda EventBridge DynamoDB EC2 K3s Prometheus CloudWatch Prophet Kubernetes CronJob SSM Secrets Manager S3 Python 3.11 Pulumi Ansible kubectl Node Exporter
Key Learnings:
- Layered autoscaling architecture combining reactive (time-aware, flash sale) and proactive (ML predictive) scaling
- Event-driven architecture patterns with Lambda chaining via EventBridge
- Distributed systems state management with DynamoDB and WAL patterns
- ML pipeline deployment with automated retraining and model versioning
- Multi-AZ infrastructure design with cost optimization (single NAT, spot instances)
- Kubernetes cluster operations including node lifecycle, pod draining, and CronJob scheduling
High Complexity - Media Processing Pipeline
Full-Stack advanced video streaming solution with adaptive bitrate technology
- Engineered a secure and scalable video platform with a
Django REST APIand aReact/TypeScriptfrontend, architected for high-performance adaptive streaming. - Implemented a robust security model, using
dj-rest-authfor token-based authentication and a protected media workflow (via NginxX-Accel-Redirect) to ensure only authorized users can access streaming content. - Built an asynchronous video processing pipeline using
Celery,Redis, andFFMPEGto transcode videos forDASHplayback, ensuring a smooth, low-latency user experience. - Automated the entire cloud workflow, from provisioning
AWS S3infrastructure withPulumiand configuring servers withAnsible, to deploying theDocker-containerized application viaGitHub Actions.
Tech Stack: Django React Celery Redis PostgreSQL FFMPEG DASH AWS S3 Nginx Docker Pulumi Ansible
β‘ Distributed Job Queue System π₯
Medium-High Complexity - Worker Orchestration
Scalable job processing system with advanced features
- Developed a distributed job queue system using
FastAPIandRedisto manage asynchronous tasks with priority-based queuing and automatic worker scaling. - Implemented a real-time monitoring dashboard with
Jinja2templates to provide visibility into job status, queue metrics, and worker activity. - Engineered an automatic worker scaling mechanism based on job load and worker availability, using
Docker Swarmto dynamically adjust resources. - Created a comprehensive error handling and fault tolerance system, including automatic retries for failed jobs and a dead-letter queue for unrecoverable tasks.
- Designed a job dependency feature to ensure complex workflows are executed in the correct order, improving system reliability.
- Containerized all services (
API,Worker,Monitor) usingDockerfor consistent deployment and simplified management.
Tech Stack: FastAPI Redis Docker Swarm Jinja2
Medium Complexity - Full-Stack Application
Full-stack financial management application for tracking installments and payments
- Backend: High-performance API built with
FastAPI, usingSQLAlchemyfor ORM with aPostgreSQLdatabase. - Frontend: Modern and responsive UI built with
React,TypeScript, andVite, styled withTailwind CSSandShadcn UI. - Asynchronous Tasks:
CeleryandRedismanage background jobs like sending OTP and due date notification emails. - Authentication: Secure JWT-based authentication with role-based access for customers and admins.
- Data Management:
Alembichandles database schema migrations, andTanStack Querymanages server state on the frontend. - DevOps: Fully containerized with
DockerandDocker Composefor reproducible development and deployment environments.
Tech Stack: FastAPI React TypeScript PostgreSQL SQLAlchemy Redis Celery Docker Tailwind CSS Shadcn UI Alembic
Medium Complexity - Async Communication
Real-time notification system for multiple channels
- Modern Backend: Built with Python and FastAPI for high-performance, asynchronous API endpoints.
- Multi-Channel Delivery: Supports sending notifications through various channels like Email, SMS, and Push Notifications.
- Asynchronous & Scalable: Leverages Celery and RabbitMQ for background task processing, ensuring the system can handle high-volume loads without blocking.
- Robust Data Storage: Uses PostgreSQL for reliable data persistence, managed with Alembic for smooth database migrations.
- Containerized Environment: Fully containerized with Docker and Docker Compose for consistent development, testing, and deployment.
- Comprehensive Testing: Includes a full suite of tests using pytest to ensure code quality and reliability.
Tech Stack: FastAPI Celery PostgreSQL RabbitMQ Redis Alembic SQLAlchemy Docker Pytest
Medium Complexity - HA Architecture
Enterprise-grade Todo application with AWS infrastructure
- Engineered full-stack application with FastAPI backend and React frontend
- Implemented Infrastructure as Code using Pulumi for AWS resource management
- Designed fault-tolerant architecture with load balancing across multiple AZs
- Built PostgreSQL replication system with automated backup/recovery
- Integrated Redis Sentinel for high availability caching
Tech Stack: FastAPI React AWS EC2 PostgreSQL Redis Sentinel Nginx Docker
June 2024 - August 2024 | Dhaka, Bangladesh
π― Delivered measurable business impact:
- Designed RBAC dashboard for 200+ users with real-time analytics
- Automated 40% of manual processes through intelligent workflows
- Built production-ready meal scheduling system with cron jobs
Bachelor of Science in Computer Science & Engineering
Daffodil International University | September 2017 - December 2022
- Building a Scalable URL Shortener: System Design to Production
- Complete architectural breakdown with Infrastructure as Code
- 100+ views, featured in system design discussions
- 200,000+ readers on Quora with tech insights in Bengali
- Nearly 200 followers engaging with technology content
- 40+ instructional videos on YouTube bridging Bengali tech education gap
- 500+ Problems Solved across multiple platforms
- Active on: BeeCrowd, LightOJ, HackerRank, LeetCode
- Contest Achievements:
- DIU Take-Off Programming Contest (Ranked 6th out of 300 participants)
- Multiple university-level programming contest participations
- π³ Kubernetes - Container orchestration at scale
I'm actively seeking opportunities to work on:
- ποΈ Distributed systems requiring high availability and fault tolerance
- βοΈ Cloud-native applications with automated infrastructure
- π Microservices architectures with proper observability
- π Open-source projects where I can contribute infrastructure expertise
Looking for a backend engineer who can:
- β Design scalable distributed systems
- β Automate infrastructure from scratch
- β Write clean, testable, maintainable code
- β Document complex architectures clearly
Let's build something amazing together!
- π§ Email: kaziiriad@gmail.com
- π± Phone: +880 1683152495
- πΌ LinkedIn: Sultan Mahmud
- π Medium: @kazisultanmahmud
- πΊ YouTube: I.T. Darshonik

