Skip to content
View Pratham-Jain-3903's full-sized avatar
💭
Working...
💭
Working...

Block or report Pratham-Jain-3903

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Pratham-Jain-3903/README.md

Pratham Jain

Data Engineer • Backend Enginer • AI/ML Engineer

Portfolio Email LinkedIn GitHub

Profile Views


About Me

Backend systems engineer specializing in distributed data platforms, cloud-native architectures, and production ML systems. I build scalable services that process millions of events, optimize infrastructure costs, and ship reliable software. Published researcher in AI security and collaborative machine learning.

class Engineer:
    def __init__(self):
        self.name = "Pratham Jain"
        self.role = "Backend Engineer | Data Platform Architect"
        self.location = "Gurugram, India"
        self.focus = ["Distributed Systems", "Data Engineering", "MLOps"]
        
    def current_work(self):
        return {
            "company": "Luminous Power Technologies (Schneider Electric)",
            "building": ["Real-time IoT data pipelines", "ML-powered analytics", "Cloud-native microservices"],
            "impact": ["14x query performance", "98% cost reduction", "100K+ users scaled"]
        }

Education

Indian Institute of Management Vishakhapatnam

  • Masters of Business Administration (PGPMCI)
  • SGPA: 4.0/4.0
  • 2025 – 2027

Indian Institute of Information Technology, Raichur

  • B.Tech in Computer Science
  • CGPA: 8.31/10
  • 2021 – 2025

Tech Arsenal

Languages & Core

Python SQL Java Go

Big Data & Analytics

Apache Spark Apache Kafka Apache Airflow DuckDB Snowflake PyArrow Parquet

Cloud Platforms

AWS Google Cloud Azure

AWS: S3 • EC2 • Lambda • Bedrock • DynamoDB
GCP: BigQuery • Dataflow • Vertex AI
Azure: Data Factory • Databricks • Event Hub • AKS • DPS • Notification Hub

Databases

PostgreSQL MongoDB DynamoDB Redis

DevOps & Orchestration

Docker Kubernetes Terraform GitHub Actions CI/CD

ML/AI Frameworks

PyTorch TensorFlow LangGraph FastAPI Amazon Lex

Systems & Tools

Nginx Git Linux REST API


Professional Experience

Data Engineering Intern — R&D

Luminous Power Technologies (Schneider Electric Group) | Feb 2025 – Present

Infrastructure Optimization:

  • Migration: Java 8 cron → Azure Data Factory + Azure Batch ETL
  • Performance: 14× faster queries, 98% cost reduction
  • Architecture: 12-hour micro-batches, repartitioning, point reads

Scalable Data Pipelines:

  • Built: Deduplication & validation pipelines with tracking IDs
  • Scaled: 100K users (~5.1M notifications/month) via Azure Notification Hub
  • Implemented: Idempotent event handling for Event Hub

Distributed Systems:

  • Problem: Thundering herd on Azure DPS
  • Solution: Exponential backoff with jitter, idempotent retries
  • Result: 11.3 hours → 32 minutes provisioning, 99.8% fewer retries, 100% success

Machine Learning:

  • EL image segmentation: 24 defect classes, 67% avg IoU (AMP + distributed training)
  • Physics-informed network: KPI extraction from IV curves, 95% accuracy
  • Quality grader: Clustering-driven approach

Cloud Services:

  • Deployed: Adaptive-agent & LLM-backed services on AKS
  • Features: Redis caching, idempotent workflows
  • Impact: 78% lower latency, 88% faster responses, 30% cost savings, 7% higher satisfaction (10K+ users)

Freelance Software Development Engineer

Neocfo.io | Feb 2025 – Apr 2025

  • Built multi-agent backend using LangGraph and Amazon Lex for natural-language legal analysis
  • Handled 2,000+ daily requests for legal clients
  • Migrated AWS Lambda → EC2, achieving 40% cost reduction while maintaining availability
  • Implemented ML pipelines for legal document analysis and natural-language querying

Data Engineer & Applied Sciences Trainee

Bosch Global Software Technologies | Mar 2024 – Feb 2025

  • Trained quantized & pruned TinyML models on 300+ GB IoT sensor data (HVAC systems)
  • Delivered 15.5% energy savings with statistical significance
  • Built containerized distributed backend for multimodal diagnostic platform
  • Achieved 96.8% diagnostic accuracy on 500+ FNAC images
  • Engineered fault-tolerant systems for STM32 microcontrollers with real-time guarantees
  • Deployed services on AWS (S3, Lambda, EC2) with containerized workloads and autoscaling

Featured Projects

Dorky - Open-source Artifact Storage Utility

npm PyPI

Open-source npm and PyPI package for storing and sharing non-code artifacts outside version control. Replaces ad-hoc sharing across chat tools and personal drives.

Key Features:

  • Simple, auditable storage layer with stable identifiers
  • Metadata support and idempotent operations
  • Streaming-safe APIs
  • Integrates with cloud object storage and existing IAM

Retail Automation System

Flipkart Grid 6.0 - Level 2 Finalist

Hybrid edge-cloud system for automated product detection using PyTorch and Qwen-VL2-2B.

Achievements:

  • >95% detection accuracy
  • <1s inference latency
  • Docker-based microservices architecture

SolarWise AI Energy Optimization Platform

Luminous TechnoX - First Runner-Up

IoT-driven energy optimization platform with LSTM-based tariff prediction and MILP-based scheduling.

Results:

  • 30% energy savings
  • Automated pipelines and monitoring
  • Built on AWS IoT, Lambda, S3

Publications

2025 | Multimodal imaging and FNAC

Multimodal system combining mammography and FNAC to improve diagnostic yield.


2025 | MLOps & Edge Computing

Pipeline design for scalable model training, edge packaging, and automated deployment.


2025 | AI Security

Architected a tamper-resistant protocol for collaborative model updates using blockchain primitives.


Achievements & Certifications

Competition Achievements

Meta Pragati Amazon ML Flipkart Grid Numerai Luminous TechnoX

Professional Certifications

AWS Google Lean Six Sigma Postman


GitHub Stats


Connect With Me

Portfolio LinkedIn GitHub Email HackerRank

Gurugram, India | +91 93019 90411


"Building scalable systems that move data from noise to insights"

Wave

Pinned Loading

  1. CRM_Mini_Project CRM_Mini_Project Public

    GenAI Credit Platform

    JavaScript 1

  2. AmazonMLChallenge24 AmazonMLChallenge24 Public

    Our team ranked 84th out of 74,830 teams in the Amazon ML Challenge 2024. The challenge involved building a scalable machine learning solution without using external APIs or gateways, leveraging on…

    Python

  3. BGSW-CAD-BreastCancerPrediction BGSW-CAD-BreastCancerPrediction Public

    This repository hosts the open-source implementation of a Breast Cancer Prediction System developed under the collaboration between Bosch Global Software Technologies (BGSW) and the Indian Institut…

    Jupyter Notebook

  4. Flipkart_Grid_6.0 Flipkart_Grid_6.0 Public

    Python 1

  5. Luminous-TechnoX-Hackathon-Submission-2024 Luminous-TechnoX-Hackathon-Submission-2024 Public

    Luminous TechnoX Hackathon Submission 2024

    Jupyter Notebook 2

  6. BGSW_no_code_tinyml_tool BGSW_no_code_tinyml_tool Public

    Python