I build data infrastructure that doesn't fall apart when your traffic spikes.
While many focus solely on writing SQL, I approach data from a System Design perspective. My goal is to bridge the gap between messy raw data and the high-performance infrastructure required to process it at scale.
Core Strengths High-Throughput Streaming: Implementing Apache Kafka and Flink for real-time data needs.
Distributed Processing: Optimizing Apache Spark (PySpark/Scala) to slash compute costs and execution windows.
Infrastructure: Skilled in managing On-Prem VMs and scaling Cloud Platforms (AWS/GCP).
Engineering Rigor: I prioritize CI/CD, data validation, and building modular, self-healing pipelines over "quick fixes."
Key Projects & Accomplishments Real-Time Analytics Engine: Architected a pipeline using Kafka and Spark Streaming that reduced data latency from hours to sub-10 seconds.
Cloud Migration: Successfully migrated a legacy on-premise data environment to a hybrid cloud setup, improving uptime by 30%.
Pipeline Optimization: Refactored a bloated ETL process in a junior role, reducing Snowflake/BigQuery credit consumption by 25% through better partitioning and logic.
Education & Continuous Learning B.S. in Computer Science / Information Technology.
Deeply committed to the Modern Data Stack—constantly testing new tools in orchestration (Dagster/Airflow) and storage (Iceberg/Delta Lake).
I’m brutally honest about what works and what doesn't. If a project doesn't need a complex Kafka setup, I’ll tell you. I’m here to build the right system, not the most expensive one.
Have a bottleneck in your data flow? Send me a message and let’s solve it.