Rio Deb

RD

Open to opportunities

Rio Deb

@riodeb

Data Engineer specializing in scalable Databricks analytics pipelines and performance optimization.

What I'm looking for

I’m looking for an enterprise-focused role where I can build Databricks/Spark pipelines, automate ETL with Terraform, and deliver measurable performance and data-quality improvements on large-scale analytics products.

I’m a Data Engineer focused on designing, building, and optimizing large-scale analytics platforms using Databricks, Apache Spark, and Delta Lake. I deliver high-performance, reliable, and scalable data solutions—especially in enterprise financial analytics environments.

In my current role, I manage enterprise analytics pipelines handling massive volumes, including ~1500 tables, 60+ TB of structured data, and datasets up to 500+ GB with 300B+ rows. I build standardized Bronze-to-Silver ETL frameworks with incremental ingestion, overwrite/merge workflows, and SCD Type 1/2 implementations, including Delta Live Tables for Silver and Gold layers.

I also prioritize automation and maintainability: I created a reusable metadata-driven transformation framework that dynamically generates Databricks SQL and PySpark logic, reducing manual query writing and accelerating onboarding. I enabled automated schema evolution across pipelines and standardized modular job templates so large teams can build consistently.

Performance and data quality are core to how I work. I optimize Spark workloads through shuffle reduction, join/aggregation re-architecture, partition pruning, predicate pushdown, and deep analysis with Spark UI and execution plans—improving stability, runtime, and cost efficiency. I back this with automated validation frameworks across Bronze, Silver, and Gold, plus Terraform-based Databricks job automation and parameterized reusable workflows.

Experience

Work history, roles, and key accomplishments

ES

Current

Data Engineer

Current

Eucloid Data Solutions

Jun 2025 - Present (1 year 1 month)

Designed and operated enterprise Databricks analytics pipelines at massive scale, managing ~1,500 tables and processing 60+ TB of structured data, including 500+ GB tables with 300B+ rows. Built metadata-driven Bronze-to-Silver ETL frameworks with SCD Type 1/2 and production Delta Live Tables (DLT) pipelines, enabling automated orchestration, schema evolution, data validation, and Spark performanc

Databricks Apache Spark Pyspark Delta Lake Delta Live Tables ETL And Schema Evolution Terraform AWS S3

Education

Degrees, certifications, and relevant coursework

GD

G.B. Pant DSEU

Bachelor of Technology, Computer Science and Engineering

Grade: CGPA: 8.3/10

Pursuing a Bachelor of Technology in Computer Science and Engineering at G.B. Pant DSEU, expected to graduate June 2026.

IM

IIT Madras

Bachelor of Science, Programming and Data Science

Pursuing a Bachelor of Science in Programming and Data Science at IIT Madras, expected to graduate June 2026.

Tech stack

Software and tools used professionally

Apache Spark

GitHub

GitHub Actions

PySpark

Gmail

Databricks

Terraform

SQL

Delta Lake

Groq

Trino

Bash

Dynamic

Interested in hiring Rio?

You can contact Rio and 90k+ other talented remote workers on Himalayas.

People also viewed

View all talent

Get matched with your dream remote job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!