Professional Summary
Data Systems and Machine Learning Engineer with experience designing high-throughput batch and real-time data pipelines, lakehouse architectures, and production ML platforms on AWS and Azure. Skilled in Spark, Kafka, Databricks, and vector search systems, with a strong focus on building scalable, reliable data and ML infrastructure for real-world applications.
π India and open for relocation
βοΈ [email protected]
π€ Open to Data Engineer | Data Scientist | ML Engineer roles
- High-throughput batch & real-time data pipelines (Spark, Kafka, Kinesis, Flink)
- Lakehouse architectures using Delta, Iceberg, Hudi, Unity Catalog
- Streaming analytics & security detection systems
- ML pipelines on Spark with GPU acceleration
- Vector search & semantic retrieval systems using FAISS & embeddings
- Multimodal RAG systems (text + image retrieval)
- Production ML with monitoring, CI/CD, and drift detection
PySpark Kafka Kinesis Flink Databricks Delta Lake Iceberg Hudi Unity Catalog
Spark ML XGBoost4J-Spark RAPIDS Evidently AI SageMaker Pipelines
FAISS Sentence-BERT Embeddings Multimodal RAG LangChain
AWS (Glue, Lambda, Athena, S3, SageMaker)
Azure (Databricks, Data Factory, Azure ML, DevOps)
.NET PostgreSQL Docker Flask API
- Processed 67M+ events
- Built batch + real-time analytics pipelines
- Apache Hudi β 50% faster queries, 40% less storage
- Kinesis + Flink + DynamoDB for DDoS/Bot detection
- XGBoost4J-Spark + RAPIDS Accelerator
- Production pipeline with SageMaker + Evidently AI
- Drift monitoring, CI/CD, orchestration
- Sentence-BERT embeddings
- FAISS vector retrieval
- Iceberg storage + Dockerized Flask API
- Text + Image embeddings
- FAISS vector indexing
- Streamlit app deployed on AWS
- Kafka + PySpark streaming joins
- Unity Catalog governance
- Azure DevOps CI/CD
Microsoft Certified: Azure Data Scientist Associate (DP-100)
https://learn.microsoft.com/en-us/users/prathy-0029/credentials/certification/azure-data-scientist
Python β’ PySpark β’ SQL β’ Spark ML β’ Kafka β’ Databricks β’ AWS β’ Azure β’ FAISS β’ Docker β’ PostgreSQL β’ PowerBI
I love working on:
- Distributed data systems
- ML at scale
- Vector search & RAG systems
- Streaming analytics
I enjoy translating complex data problems into scalable engineering systems.
