Skip to main content
HimalayasHimalayas logo
VJ
Open to opportunities

Vivek Jain

@vivekjain3

Data Engineer specializing in scalable ETL/ELT pipelines with PySpark, Azure Databricks and SQL.

Zimbabwe
Message

What I'm looking for

I’m looking to build scalable lakehouse ETL/ELT pipelines using PySpark on Azure, with Airflow/ADF orchestration. I want to improve performance and cut cloud cost through Delta Lake and Spark optimization, delivering trusted analytics-ready data.

I’m a Data Engineer with 5+ years of experience building scalable, high-performance data pipelines using PySpark, Azure Databricks, and SQL. I specialize in processing large-scale datasets (10M+ records/hour) and tuning distributed systems for both performance and cost efficiency. My work has improved pipeline performance by 30% and reduced cloud costs by up to 25%.

I build end-to-end ETL/ELT workflows across modern lakehouse patterns, including Delta Lake and Data Lakehouse architecture with Medallion (Bronze-Silver-Gold) design. I focus on reliable ingestion, strong data modeling, and production-ready data warehousing so teams can trust analytics-ready datasets. For orchestration, I’ve used Apache Airflow and Azure Data Factory (ADF) to automate workflows and strengthen scheduling and execution reliability.

In my recent role, I designed and deployed ETL pipelines on Azure Databricks, implemented Delta Lake for ACID transactions and optimized storage, and orchestrated end-to-end workflows via ADF and Airflow. I also engineered ingestion from REST APIs, web scraping, and Azure Blob Storage, while optimizing Spark jobs through partitioning and caching to cut execution time by 30%. I’ve added validation and quality checks and monitored cloud resource utilization to reduce infrastructure costs by 15–25%.

Beyond core pipeline engineering, I’ve delivered developer-friendly integrations and analytics support—building REST APIs with Flask, and creating interactive dashboards using Power BI and Tableau. I bring a practical, engineering-first mindset that balances correctness, scalability, and measurable business impact.

Experience

Work history, roles, and key accomplishments

EE

Software Engineer

Energy Exemplar

Aug 2025 - Mar 2026 (7 months)

Designed and deployed scalable PySpark ETL pipelines on Azure Databricks processing 10M+ records/hour. Implemented Delta Lake and orchestrated workflows with Azure Data Factory and Apache Airflow, reducing execution time by 30% and cloud costs by 15–25%.

RS

Python Developer

Rapid Staffing & Training Solutions

Nov 2020 - Apr 2022 (1 year 5 months)

Developed Python/SQL ETL workflows using Pandas for data extraction, transformation, and loading. Built Flask-based REST APIs and automated reporting workflows, reducing manual effort by 30%.

Education

Degrees, certifications, and relevant coursework

PDM University logoPU

PDM University

Master of Computer Applications, Computer Science

Grade: 85%

Completed an MCA (Computer Science) at PDM University in 2020, scoring 85%.

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan