Backend systems engineer specializing in distributed data platforms, cloud-native architectures, and production ML systems. I build scalable services that process millions of events, optimize infrastructure costs, and ship reliable software. Published researcher in AI security and collaborative machine learning.
class Engineer:
def __init__(self):
self.name = "Pratham Jain"
self.role = "Backend Engineer | Data Platform Architect"
self.location = "Gurugram, India"
self.focus = ["Distributed Systems", "Data Engineering", "MLOps"]
def current_work(self):
return {
"company": "Luminous Power Technologies (Schneider Electric)",
"building": ["Real-time IoT data pipelines", "ML-powered analytics", "Cloud-native microservices"],
"impact": ["14x query performance", "98% cost reduction", "100K+ users scaled"]
}|
Indian Institute of Management Vishakhapatnam
|
Indian Institute of Information Technology, Raichur
|
AWS: S3 • EC2 • Lambda • Bedrock • DynamoDB
GCP: BigQuery • Dataflow • Vertex AI
Azure: Data Factory • Databricks • Event Hub • AKS • DPS • Notification Hub
Luminous Power Technologies (Schneider Electric Group) | Feb 2025 – Present
Infrastructure Optimization:
- Migration: Java 8 cron → Azure Data Factory + Azure Batch ETL
- Performance: 14× faster queries, 98% cost reduction
- Architecture: 12-hour micro-batches, repartitioning, point reads
Scalable Data Pipelines:
- Built: Deduplication & validation pipelines with tracking IDs
- Scaled: 100K users (~5.1M notifications/month) via Azure Notification Hub
- Implemented: Idempotent event handling for Event Hub
Distributed Systems:
- Problem: Thundering herd on Azure DPS
- Solution: Exponential backoff with jitter, idempotent retries
- Result: 11.3 hours → 32 minutes provisioning, 99.8% fewer retries, 100% success
Machine Learning:
- EL image segmentation: 24 defect classes, 67% avg IoU (AMP + distributed training)
- Physics-informed network: KPI extraction from IV curves, 95% accuracy
- Quality grader: Clustering-driven approach
Cloud Services:
- Deployed: Adaptive-agent & LLM-backed services on AKS
- Features: Redis caching, idempotent workflows
- Impact: 78% lower latency, 88% faster responses, 30% cost savings, 7% higher satisfaction (10K+ users)
Neocfo.io | Feb 2025 – Apr 2025
- Built multi-agent backend using LangGraph and Amazon Lex for natural-language legal analysis
- Handled 2,000+ daily requests for legal clients
- Migrated AWS Lambda → EC2, achieving 40% cost reduction while maintaining availability
- Implemented ML pipelines for legal document analysis and natural-language querying
Bosch Global Software Technologies | Mar 2024 – Feb 2025
- Trained quantized & pruned TinyML models on 300+ GB IoT sensor data (HVAC systems)
- Delivered 15.5% energy savings with statistical significance
- Built containerized distributed backend for multimodal diagnostic platform
- Achieved 96.8% diagnostic accuracy on 500+ FNAC images
- Engineered fault-tolerant systems for STM32 microcontrollers with real-time guarantees
- Deployed services on AWS (S3, Lambda, EC2) with containerized workloads and autoscaling
Dorky - Open-source Artifact Storage Utility
Open-source npm and PyPI package for storing and sharing non-code artifacts outside version control. Replaces ad-hoc sharing across chat tools and personal drives.
Key Features:
- Simple, auditable storage layer with stable identifiers
- Metadata support and idempotent operations
- Streaming-safe APIs
- Integrates with cloud object storage and existing IAM
Flipkart Grid 6.0 - Level 2 Finalist
Hybrid edge-cloud system for automated product detection using PyTorch and Qwen-VL2-2B.
Achievements:
- >95% detection accuracy
- <1s inference latency
- Docker-based microservices architecture
Luminous TechnoX - First Runner-Up
IoT-driven energy optimization platform with LSTM-based tariff prediction and MILP-based scheduling.
Results:
- 30% energy savings
- Automated pipelines and monitoring
- Built on AWS IoT, Lambda, S3
2025 | Multimodal imaging and FNAC
Multimodal system combining mammography and FNAC to improve diagnostic yield.
2025 | MLOps & Edge Computing
Pipeline design for scalable model training, edge packaging, and automated deployment.
2025 | AI Security
Architected a tamper-resistant protocol for collaborative model updates using blockchain primitives.


