Open to opportunities

Sagar R

@sagarr

Message

Senior Data Engineer building scalable pipelines and ML systems on cloud.

India

Message

What I'm looking for

I’m looking for a Senior/Lead data engineering role where I can own end-to-end data pipelines, build real-time architectures, and ship ML-enabled analytics—while driving measurable cost/performance wins with strong monitoring, governance, and collaboration.

I’m a Senior Data Engineer with 3.5+ years building scalable data pipelines and ML systems across top Indian fintech environments. I’ve driven over $300K in annual infrastructure savings through cost optimization, performance tuning, and operational discipline.

I specialize in PySpark/Apache Spark and real-time streaming architectures, with hands-on work using Apache Kafka, Debezium CDC, and ClickHouse to cut time-series query latency from 10 minutes to 500ms. I’ve also built ML solutions like a CNN+LSTM churn prediction model with SHAP explainability, improving user retention by 17% for a wealth management platform.

Cloud-native execution is a constant theme in my work—Snowflake, AWS services, and lakehouse patterns with Delta Lake and Parquet for governance, reliability, and faster analytics. I bring a strong engineering-to-outcomes mindset: optimizing Snowflake spend by 49%, improving query response times by 30% with schema/materialized views, and designing migrations and monitoring to keep SLAs steady while teams scale.

Experience

Work history, roles, and key accomplishments

Current

Senior Software Engineer — Data & ML

Current

Dezerv

Mar 2025 - Present (1 year 5 months)

Owned end-to-end design and implementation of a CNN+LSTM churn prediction model with SHAP explainability, driving a 17% increase in user retention on the wealth management platform. Architected and deployed a real-time Kafka/Debezium/ClickHouse pipeline, cutting time-series query latency from 10 minutes to 500 ms and reducing monthly Snowflake spend 49% ($5,500 to $2,800).

PySpark Apache Spark Debezium Clickhouse Snowflake SHAP Data Modeling Materialized Views Kafka

Software Engineer — Data Engineering

CoinDCX

Mar 2024 - Mar 2025 (1 year)

Led decommissioning of Confluent Kafka and migrated to a self-managed AWS MSK cluster, saving approximately $110K/year in licensing and operational costs. Re-architected the LakeTrade data model into microservices within one month, improving deployment speed, and integrated PagerDuty with automated recovery workflows to reduce MTTR by 40%.

Databricks Microservices AWS Lambda EventBridge PagerDuty Schema Design Kafka

Software Engineer — Data Engineering

Jumbotail

Nov 2022 - Mar 2024 (1 year 4 months)

Designed, implemented, and productionized a Spark Structured Streaming CDC system using PySpark and Debezium, processing 100M+ events/day with exactly-once guarantees and built alerting plus failure recovery mechanisms. Reduced annual query infrastructure cost by 31% ($36K to $25K) with Trino autoscaling, and cut compute time by 98% via PySpark optimizations (partition pruning and predicate pushdo

PySpark Spark Structured Streaming Debezium Trino Exactly Once Processing Delta Lake Parquet AutoScaling Predicate Pushdown

Co-Founder & Product Lead

Alephs360

Jul 2021 - Oct 2022 (1 year 3 months)

Developed and maintained Python/SQL ETL pipelines for financial data, ensuring data accuracy and timely delivery. Implemented data quality checks and monitoring with AWS CloudWatch, reducing data reconciliation errors by 25%, and optimized AWS S3 storage and query performance with partitioning/lifecycle policies to cut storage costs by 15%.