Looking for a job

Raghav Jaju

@raghavjaju

Message

Data engineer specializing in cloud ETL, streaming pipelines, and data platform optimization.

India

Message

What I'm looking for

I’m interested in roles where I can design scalable, cost-efficient data platforms and work on both streaming and batch ETL. I enjoy being part of collaborative teams that focus on real business impact and strong technical ownership.

I am a data engineer with hands-on experience designing and operating scalable data pipelines across GCP and Azure ecosystems. I focus on building reliable streaming architectures, implementing medallion-style transformations, and optimizing storage and query costs for analytics.

At my current role I redesigned an event ingestion pipeline to a Pub/Sub → Dataflow streaming architecture, eliminating data loss and enabling real-time ingestion into BigQuery. I reduced BigQuery costs by ~80% through partitioning, clustering, and better table design.

I implement DBT models and validation checks, enforce schema governance, and collaborate with product and analytics teams to deliver curated gold-layer datasets and automated dashboards. I have experience with Dataplex concepts, Airflow, PySpark on Databricks, Dataproc, and Vertex AI for model training and deployment.

I automate deployments with Bash, optimize ETL performance, and have solved practical problems such as compressing large video assets and migrating data across cloud services. I aim to build reliable, cost-effective data platforms that drive product and business insights.

Experience

Work history, roles, and key accomplishments

Current

Data Engineer

Current

Super Gaming

Feb 2025 - Present (1 year 1 month)

Redesigned event ingestion to a Pub/Sub → Dataflow streaming architecture, eliminating 20% data loss to achieve 100% data reliability and enabling real-time BigQuery ingestion; reduced BigQuery costs ~80% via partitioning and clustering and implemented DBT Medallion models for unified analytics.

Pub Sub Dataflow BigQuery DBT Partitioning Looker Dataplex BASH

Data Engineer

Sigmoid Analytics

Dec 2023 - Jan 2025 (1 year 1 month)

Migrated customer engagement data using Airflow and BigQuery to GCS and developed PySpark jobs on Azure Databricks across Bronze/Silver/Gold layers to enforce schema, clean/deduplicate data, and produce business-ready aggregates.

Airflow BigQuery Google Cloud Storage Pyspark Databricks Medallion Architecture Data Cleaning And Deduplication ETL

Data Engineer

Persistent Systems

May 2021 - Nov 2023 (2 years 6 months)

Built ETL and streaming solutions including PySpark jobs handling Elasticsearch indices up to 10GB on Dataproc, Beam/Dataflow pipelines for npy→tfrecords, and services compressing 4K drone videos to ~5–10% size while improving log write performance 10x.