Skip to main content
rahul lohiaRL
Open to opportunities

rahul lohia

@rahullohia

Senior Data Engineer optimizing real-time and batch pipelines for scalable, cost-efficient platforms.

India
Message

What I'm looking for

I want to build and harden scalable data platforms—real-time and batch—with strong observability, data quality, and cost/performance optimization. I’m excited by roles where I can own pipelines end-to-end and deliver measurable impact.

I’m a Senior Data Engineer with 5+ years designing and optimizing production-scale data pipelines and big data platforms across travel and telecom domains. I focus on ETL/ELT engineering that improves reliability, performance, and operational clarity.

In my current role, I reduced a data platform’s monthly infrastructure cost from $85K to $4K (~95%, ~$972K/year saved) by rewriting Spark execution plans, right-sizing EMR clusters, and automating S3 storage lifecycle management. I’ve also owned production RCA and delivered permanent fixes, cutting Spark job compute and runtime by up to 80% through broadcast join tuning, partition optimization, and cluster-level configuration.

I build end-to-end pipelines with Apache Spark (Scala/PySpark) for both batch and real-time workloads. I orchestrate workflows via Apache Airflow and connect observability and alerting through Datadog, including SLA breach detection and anomaly monitoring to keep pipelines dependable.

My technical depth spans Kafka, Spark Structured Streaming, Hive, HBase, and AWS, plus data governance and quality through Collibra Data Quality (CDQ). I’ve delivered outcomes like sub-minute end-to-end latency for high-volume event streams and automated Kafka-vs-Hive reconciliation with HTML audit reporting to proactively detect data loss incidents.

Experience

Work history, roles, and key accomplishments

AA
Current

Senior Data Engineer

Affine Analytics

Apr 2024 - Present (2 years 2 months)

Reduced a travel data platform’s monthly infrastructure cost from $85K to $4K (~95%, ~$972K/year saved) by rewriting Spark execution plans, right-sizing EMR clusters, and automating S3 storage lifecycle policies. Owned end-to-end Spark-Scala ETL to deliver curated datasets and implemented a Collibra CDQ framework (50+ validations) with Airflow orchestration and Datadog observability to improve rel

CO

Data Engineer

Cognizant

Feb 2021 - Apr 2024 (3 years 2 months)

Built production Kafka and Spark Structured Streaming pipelines for telecom event ingestion, achieving sub-minute end-to-end latency on high-volume streams. Reduced Hive batch/streaming runtimes by 30–40% using dynamic partitioning, bucketing, and query optimization, and delivered secure access-management and Kafka-vs-Hive reconciliation workflows with automated HTML audit reporting.

Education

Degrees, certifications, and relevant coursework

Maulana Abul Kalam Azad University of Technology (MAKAUT) logoMM

Maulana Abul Kalam Azad University of Technology (MAKAUT)

Bachelor of Technology (B.Tech), Computer Science & Engineering

2017 - 2021

Grade: DGPA: 8.91 / 10

Earned a B.Tech in Computer Science & Engineering at MAKAUT, Kolkata (2017–2021) with a DGPA of 8.91/10.

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan