Raj Gandhi
@rajgandhi
Data Engineer focused on scalable batch and low-latency real-time pipelines.
What I'm looking for
I’m a Data Engineer with 4 years of experience designing and operating scalable batch and real-time data pipelines. I build solutions end-to-end—modeling, enrichment, and reliable data delivery—so analytics teams can trust their numbers.
In my current role, I designed and implemented a low-latency real-time sessionization system using Redis, Lua scripts, and Java, achieving ~300 ms P99 latency. I also provisioned and operated infrastructure with CloudFormation, ensured consistency through atomic Redis updates, and validated correctness using data parity checks and monitoring/alarms.
I’ve led data privacy and compliance work by removing sensitive data from datasets owned across multiple teams. I collaborated with privacy partners to analyze sources and deliver a centralized sanitized dataset for downstream consumption, ensuring policy compliance at scale.
Earlier, I built large-scale Spark ETL pipelines and optimized terabyte-scale processing by tuning joins and shuffle/memory behavior to improve performance and reliability. I also migrated data from GCP to AWS using PySpark (200–300 GB/day) and automated workflows with Python, AWS Lambda, and AWS Glue—plus delivered stakeholder-facing QuickSight KPI dashboards and a Streamlit reporting web app.
Experience
Work history, roles, and key accomplishments
Designed a low-latency (~300 ms P99) real-time sessionization system using Redis, Lua, and Java. Led terabyte-scale Spark ETL for data enrichment and a cross-team data privacy initiative, building a centralized sanitized dataset for compliant downstream analytics.
Data Engineer I
BookMyShow
Jun 2022 - Dec 2024 (2 years 6 months)
Built a PySpark pipeline to migrate 200–300 GB/day from GCP to AWS, replacing the existing workflow and saving ~$7,300/year while improving analytics availability. Migrated SQL/NoSQL to Amazon Redshift with a deduplication hash ensuring 100% data integrity, and created QuickSight dashboards plus automation using Python, AWS Lambda, AWS Glue, and a Streamlit reporting app.
Education
Degrees, certifications, and relevant coursework
University of Mumbai
Bachelor of Engineering, Computer Science
2018 - 2022
Activities and societies: Award: Spot Award for the OND 2023 Quarter. Certificates: dbt Fundamentals, Databricks Lakehouse Fundamentals, Snowflake Data Engineering, and Data Warehousing workshops.
Completed a Bachelor of Engineering in Computer Science at the University of Mumbai from 2018 to 2022.
Availability
Location
Authorized to work in
Social media
Job categories
Skills
Interested in hiring Raj?
You can contact Raj and 90k+ other talented remote workers on Himalayas.
Message RajFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
