rahul lohia
@rahullohia
Senior Data Engineer optimizing real-time and batch pipelines for scalable, cost-efficient platforms.
What I'm looking for
I’m a Senior Data Engineer with 5+ years designing and optimizing production-scale data pipelines and big data platforms across travel and telecom domains. I focus on ETL/ELT engineering that improves reliability, performance, and operational clarity.
In my current role, I reduced a data platform’s monthly infrastructure cost from $85K to $4K (~95%, ~$972K/year saved) by rewriting Spark execution plans, right-sizing EMR clusters, and automating S3 storage lifecycle management. I’ve also owned production RCA and delivered permanent fixes, cutting Spark job compute and runtime by up to 80% through broadcast join tuning, partition optimization, and cluster-level configuration.
I build end-to-end pipelines with Apache Spark (Scala/PySpark) for both batch and real-time workloads. I orchestrate workflows via Apache Airflow and connect observability and alerting through Datadog, including SLA breach detection and anomaly monitoring to keep pipelines dependable.
My technical depth spans Kafka, Spark Structured Streaming, Hive, HBase, and AWS, plus data governance and quality through Collibra Data Quality (CDQ). I’ve delivered outcomes like sub-minute end-to-end latency for high-volume event streams and automated Kafka-vs-Hive reconciliation with HTML audit reporting to proactively detect data loss incidents.
Experience
Work history, roles, and key accomplishments
Senior Data Engineer
Affine Analytics
Apr 2024 - Present (2 years 2 months)
Reduced a travel data platform’s monthly infrastructure cost from $85K to $4K (~95%, ~$972K/year saved) by rewriting Spark execution plans, right-sizing EMR clusters, and automating S3 storage lifecycle policies. Owned end-to-end Spark-Scala ETL to deliver curated datasets and implemented a Collibra CDQ framework (50+ validations) with Airflow orchestration and Datadog observability to improve rel
Data Engineer
Cognizant
Feb 2021 - Apr 2024 (3 years 2 months)
Built production Kafka and Spark Structured Streaming pipelines for telecom event ingestion, achieving sub-minute end-to-end latency on high-volume streams. Reduced Hive batch/streaming runtimes by 30–40% using dynamic partitioning, bucketing, and query optimization, and delivered secure access-management and Kafka-vs-Hive reconciliation workflows with automated HTML audit reporting.
Education
Degrees, certifications, and relevant coursework
Maulana Abul Kalam Azad University of Technology (MAKAUT)
Bachelor of Technology (B.Tech), Computer Science & Engineering
2017 - 2021
Grade: DGPA: 8.91 / 10
Earned a B.Tech in Computer Science & Engineering at MAKAUT, Kolkata (2017–2021) with a DGPA of 8.91/10.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Salary expectations
Social media
Job categories
Skills
Interested in hiring rahul?
You can contact rahul and 90k+ other talented remote workers on Himalayas.
Message rahulFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
