VJ

Open to opportunities

Vivek Jain

@vivekjain3

Data Engineer specializing in scalable ETL/ELT pipelines with PySpark, Azure Databricks and SQL.

What I'm looking for

I’m looking to build scalable lakehouse ETL/ELT pipelines using PySpark on Azure, with Airflow/ADF orchestration. I want to improve performance and cut cloud cost through Delta Lake and Spark optimization, delivering trusted analytics-ready data.

I’m a Data Engineer with 5+ years of experience building scalable, high-performance data pipelines using PySpark, Azure Databricks, and SQL. I specialize in processing large-scale datasets (10M+ records/hour) and tuning distributed systems for both performance and cost efficiency. My work has improved pipeline performance by 30% and reduced cloud costs by up to 25%.

I build end-to-end ETL/ELT workflows across modern lakehouse patterns, including Delta Lake and Data Lakehouse architecture with Medallion (Bronze-Silver-Gold) design. I focus on reliable ingestion, strong data modeling, and production-ready data warehousing so teams can trust analytics-ready datasets. For orchestration, I’ve used Apache Airflow and Azure Data Factory (ADF) to automate workflows and strengthen scheduling and execution reliability.

In my recent role, I designed and deployed ETL pipelines on Azure Databricks, implemented Delta Lake for ACID transactions and optimized storage, and orchestrated end-to-end workflows via ADF and Airflow. I also engineered ingestion from REST APIs, web scraping, and Azure Blob Storage, while optimizing Spark jobs through partitioning and caching to cut execution time by 30%. I’ve added validation and quality checks and monitored cloud resource utilization to reduce infrastructure costs by 15–25%.

Beyond core pipeline engineering, I’ve delivered developer-friendly integrations and analytics support—building REST APIs with Flask, and creating interactive dashboards using Power BI and Tableau. I bring a practical, engineering-first mindset that balances correctness, scalability, and measurable business impact.

Experience

Work history, roles, and key accomplishments

EE

Software Engineer

Energy Exemplar

Aug 2025 - Mar 2026 (7 months)

Designed and deployed scalable PySpark ETL pipelines on Azure Databricks processing 10M+ records/hour. Implemented Delta Lake and orchestrated workflows with Azure Data Factory and Apache Airflow, reducing execution time by 30% and cloud costs by 15–25%.

Pyspark Databricks Azure Data Factory Apache Airflow Delta Lake Spark Optimization

CA

Senior Software Engineer

Apr 2022 - Aug 2025 (3 years 4 months)

Built large-scale ETL pipelines using PySpark and SQL in Azure Databricks for high-volume data processing. Optimized SQL and Spark transformations to improve performance by 30% and used Airflow to orchestrate reliable production workflows.

Pyspark SQL Databricks Apache Airflow Distributed Data Processing Lakehouse Architecture Data Warehousing Spark Optimization

RS

Python Developer

Rapid Staffing & Training Solutions

Nov 2020 - Apr 2022 (1 year 5 months)

Developed Python/SQL ETL workflows using Pandas for data extraction, transformation, and loading. Built Flask-based REST APIs and automated reporting workflows, reducing manual effort by 30%.

Python SQL Pandas ETL MongoDB Flask REST APIs Data Integration

Education

Degrees, certifications, and relevant coursework

PU

PDM University

Master of Computer Applications, Computer Science

Grade: 85%

Completed an MCA (Computer Science) at PDM University in 2020, scoring 85%.

Tech stack

Software and tools used professionally

Apache Spark

Microsoft Azure

GitHub

Pandas

PySpark

MySQL

PostgreSQL

MongoDB

Gmail

Databricks

Airflow

SQL

Azure Blob Storage

Delta Lake

Factory

PDM

Interested in hiring Vivek?

You can contact Vivek and 90k+ other talented remote workers on Himalayas.

People also viewed

View all talent

Get matched with your dream remote job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!