I’m interested in roles where I can design scalable, cost-efficient data platforms and work on both streaming and batch ETL. I enjoy being part of collaborative teams that focus on real business impact and strong technical ownership.
Raghav Jaju
@raghavjaju
Data engineer specializing in cloud ETL, streaming pipelines, and data platform optimization.
What I'm looking for
I am a data engineer with hands-on experience designing and operating scalable data pipelines across GCP and Azure ecosystems. I focus on building reliable streaming architectures, implementing medallion-style transformations, and optimizing storage and query costs for analytics.
At my current role I redesigned an event ingestion pipeline to a Pub/Sub → Dataflow streaming architecture, eliminating data loss and enabling real-time ingestion into BigQuery. I reduced BigQuery costs by ~80% through partitioning, clustering, and better table design.
I implement DBT models and validation checks, enforce schema governance, and collaborate with product and analytics teams to deliver curated gold-layer datasets and automated dashboards. I have experience with Dataplex concepts, Airflow, PySpark on Databricks, Dataproc, and Vertex AI for model training and deployment.
I automate deployments with Bash, optimize ETL performance, and have solved practical problems such as compressing large video assets and migrating data across cloud services. I aim to build reliable, cost-effective data platforms that drive product and business insights.
Experience
Work history, roles, and key accomplishments
Data Engineer
Super Gaming
Feb 2025 - Present (10 months)
Redesigned event ingestion to a Pub/Sub → Dataflow streaming architecture, eliminating 20% data loss to achieve 100% data reliability and enabling real-time BigQuery ingestion; reduced BigQuery costs ~80% via partitioning and clustering and implemented DBT Medallion models for unified analytics.
Data Engineer
Sigmoid Analytics
Dec 2023 - Jan 2025 (1 year 1 month)
Migrated customer engagement data using Airflow and BigQuery to GCS and developed PySpark jobs on Azure Databricks across Bronze/Silver/Gold layers to enforce schema, clean/deduplicate data, and produce business-ready aggregates.
Built ETL and streaming solutions including PySpark jobs handling Elasticsearch indices up to 10GB on Dataproc, Beam/Dataflow pipelines for npy→tfrecords, and services compressing 4K drone videos to ~5–10% size while improving log write performance 10x.
Education
Degrees, certifications, and relevant coursework
The LNM Institute of Information Technology
Bachelor of Technology, Computer Science
Bachelor of Technology in Computer Science completed in 2021 at The LNM Institute of Information Technology.
Tech stack
Software and tools used professionally
Apache Spark
Looker
Google Cloud Platform
Google Cloud Storage
Kubernetes
PySpark
Google BigQuery Data Transf...
dbt
Hadoop
Gmail
Databricks
FFMPEG
Python
TensorFlow
Google Cloud Dataflow
Firebase
Google Cloud Pub/Sub
Elasticsearch
Google Cloud Functions
Google Cloud SQL
TypeScript
Docker
Airflow
Apache Beam
Google BigQuery
SQL
Google Kubernetes Engine
Google Cloud Run
Google Cloud Dataproc
Bash
Transform
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Raghav?
You can contact Raghav and 90k+ other talented remote workers on Himalayas.
Message RaghavFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
