Vivek Jain
@vivekjain3
Data Engineer specializing in scalable ETL/ELT pipelines with PySpark, Azure Databricks and SQL.
What I'm looking for
I’m a Data Engineer with 5+ years of experience building scalable, high-performance data pipelines using PySpark, Azure Databricks, and SQL. I specialize in processing large-scale datasets (10M+ records/hour) and tuning distributed systems for both performance and cost efficiency. My work has improved pipeline performance by 30% and reduced cloud costs by up to 25%.
I build end-to-end ETL/ELT workflows across modern lakehouse patterns, including Delta Lake and Data Lakehouse architecture with Medallion (Bronze-Silver-Gold) design. I focus on reliable ingestion, strong data modeling, and production-ready data warehousing so teams can trust analytics-ready datasets. For orchestration, I’ve used Apache Airflow and Azure Data Factory (ADF) to automate workflows and strengthen scheduling and execution reliability.
In my recent role, I designed and deployed ETL pipelines on Azure Databricks, implemented Delta Lake for ACID transactions and optimized storage, and orchestrated end-to-end workflows via ADF and Airflow. I also engineered ingestion from REST APIs, web scraping, and Azure Blob Storage, while optimizing Spark jobs through partitioning and caching to cut execution time by 30%. I’ve added validation and quality checks and monitored cloud resource utilization to reduce infrastructure costs by 15–25%.
Beyond core pipeline engineering, I’ve delivered developer-friendly integrations and analytics support—building REST APIs with Flask, and creating interactive dashboards using Power BI and Tableau. I bring a practical, engineering-first mindset that balances correctness, scalability, and measurable business impact.
Experience
Work history, roles, and key accomplishments
Software Engineer
Energy Exemplar
Aug 2025 - Mar 2026 (7 months)
Designed and deployed scalable PySpark ETL pipelines on Azure Databricks processing 10M+ records/hour. Implemented Delta Lake and orchestrated workflows with Azure Data Factory and Apache Airflow, reducing execution time by 30% and cloud costs by 15–25%.
Built large-scale ETL pipelines using PySpark and SQL in Azure Databricks for high-volume data processing. Optimized SQL and Spark transformations to improve performance by 30% and used Airflow to orchestrate reliable production workflows.
Python Developer
Rapid Staffing & Training Solutions
Nov 2020 - Apr 2022 (1 year 5 months)
Developed Python/SQL ETL workflows using Pandas for data extraction, transformation, and loading. Built Flask-based REST APIs and automated reporting workflows, reducing manual effort by 30%.
Education
Degrees, certifications, and relevant coursework
PDM University
Master of Computer Applications, Computer Science
Grade: 85%
Completed an MCA (Computer Science) at PDM University in 2020, scoring 85%.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Portfolio
github.com/jainvivek8456Job categories
Skills
Interested in hiring Vivek?
You can contact Vivek and 90k+ other talented remote workers on Himalayas.
Message VivekFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
