Rahul Sahu
@rahulsahu2
Senior Data Engineer focused on cloud data pipelines, Snowflake, and real-time analytics optimization.
What I'm looking for
I’m a Data Engineer with 6+ years of experience building and optimizing large-scale data pipelines across retail, finance, and education. I bring deep expertise in cloud data engineering and data warehousing to help teams turn complex datasets into reliable business insights.
At TripAdvisor, I architected and optimized Snowflake pipelines that ingest and transform diverse travel datasets, processing over 500GB of daily incremental updates. I engineered a unified 360-degree traveler view by merging on-platform interactions and off-platform bookings, improving personalized travel recommendations by 15%.
I also focused on real-time impact: I leveraged Snowflake Streams and Task for CDC and used Snowpark for complex Python-based transformations, reducing end-to-end data latency from 4 hours to under 30 minutes. To keep analytics trustworthy, I implemented data governance and quality frameworks (99.9% data accuracy) and optimized warehouse performance and cost, cutting monthly Snowflake credits by 20%.
Earlier, I delivered measurable outcomes at Credit Saison, reducing Athena query scan costs by 97% through S3 partitioning and Glue metadata management, and building transformation jobs with PySpark for financial datasets. At Embibe, I developed batch and streaming pipelines using Spark and Kafka and built real-time ranking with Kafka, Spark, and Redis—grounding my engineering style in performance, correctness, and practical delivery.
Experience
Work history, roles, and key accomplishments
Architected and optimized Snowflake pipelines ingesting and transforming diverse travel datasets with 500GB/day incremental updates, enabling a 360-degree traveler view that improved personalized travel recommendations by 15%. Reduced real-time CDC latency from 4 hours to under 30 minutes and improved warehouse cost/performance with 20% lower monthly Snowflake credits.
Built and managed Whampipe-orchestrated ETL/ELT workflows using advanced SQL to automate travel data movement across multi-cloud environments. Tuned query execution and partitioning to cut peak-season ETL processing time by 25% and added automated SQL validations to detect schema drift and data anomalies.
Data Engineer 2
Blackbuck Insights
Feb 2022 - May 2024 (2 years 3 months)
Developed ingestion pipelines to load SFTP and GCS files into BigQuery for centralized processing, supporting batch analytics and CDP data structures. Built Airflow (Composer) batch ETL for CSV/JSON/Parquet into BigQuery and used SparkSQL/PySpark with Dataproc to explore datasets for ad performance and segmentation.
Data Engineer 2
Credit Saison
Nov 2020 - Jan 2022 (1 year 2 months)
Managed an AWS data lake on S3 with metadata cataloging in AWS Glue and crawling via Lambda to support NBFC financial analytics. Implemented PySpark transformation jobs and reduced Athena query scan costs by 97% using S3 partitioning strategies.
Data Engineer
Embibe
Jul 2019 - Nov 2020 (1 year 4 months)
Built batch and streaming pipelines using Spark (Scala) to process examination data and integrated real-time ingestion with Kafka. Developed student ranking pipelines using Kafka, Spark, and Redis and delivered exam trend insights to support business decision-making.
Education
Degrees, certifications, and relevant coursework
ABES Engineering College
Bachelor of Technology, Computer Science
2019 -
Completed a B.Tech in Computer Science at ABES Engineering College in 2019.
Availability
Location
Authorized to work in
Social media
Job categories
Skills
Interested in hiring Rahul?
You can contact Rahul and 90k+ other talented remote workers on Himalayas.
Message RahulFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
