Roro Zoro

RZ

Open to opportunities

Roro Zoro

@rorozoro

I’m a Data Engineer optimizing PySpark/Databricks pipelines for reliable, scalable banking data.

What I'm looking for

I’m looking to build and modernize end-to-end data pipelines with PySpark/Databricks and AWS. I want to own Spark performance optimization, reliable orchestration, and strong monitoring so data platforms stay dependable at scale.

I’m a Data Engineer with 4 years of experience designing and optimizing scalable data pipelines using PySpark, Databricks, AWS, and the Hadoop ecosystem. I focus on ETL modernization, distributed data processing, and building cloud-native data engineering solutions that are dependable in production.

I’ve automated migration of legacy SQL workloads into reusable PySpark frameworks and built Delta Lake-based architectures. At Barclays, I developed and optimized PySpark pipelines for enterprise banking control frameworks, implementing validation controls, reusable ETL components, and efficient Spark partition strategies.

I also prioritize operational excellence: I’ve implemented centralized logging and monitoring with AWS CloudWatch, orchestrated end-to-end ETL with AWS Lambda/Glue/Step Functions, and strengthened reliability through resilience testing (AWS Fault Injection Simulator). From Capgemini to Barclays, I’ve supported production deployment and cutover activities with zero critical production incidents during release cycles.

Experience

Work history, roles, and key accomplishments

BA

Current

Data Engineer

Current

Barclays

Sep 2025 - Present (10 months)

Developed scalable PySpark pipelines for enterprise banking control frameworks, including automated migration of legacy SQL workflows into standardized PySpark frameworks. Built reusable ETL components and AWS-native orchestration with Lambda/Glue/Step Functions, and improved monitoring and reliability using CloudWatch and AWS Fault Injection Simulator.

PySpark Databricks ETL AWS Lambda AWS Glue AWS Step Functions AWS Fault Injection Simulator Workflow Orchestration Logging Cloudwatch Audit

CA

Data Engineer

Dec 2022 - Sep 2025 (2 years 9 months)

Developed and optimized PySpark-based ETL pipelines on AWS EMR and the Hadoop ecosystem for large-scale distributed data processing. Implemented Airflow scheduling and AWS Glue/S3 data lake architecture, improved performance with Spark and Hive tuning, and automated Databricks deployments with CI/CD using GitHub Actions and Jenkins.

PySpark Apache Spark Hadoop AWS Glue Hive Hive Bucketing Terraform Databricks CI CD GitHub Actions Jenkins Shell Scripting Redshift Snowflake S3 Airflow

Education

Degrees, certifications, and relevant coursework

SU

Savitribai Phule Pune University

Bachelor of Engineering, Electronics and Telecommunication

2019 - 2022

Bachelor of Engineering in Electronics and Telecommunication from Savitribai Phule Pune University (2019–2022).

Tech stack

Software and tools used professionally

Apache Spark

AWS Glue

AWS Step Functions

GitHub

GitLab

Jenkins

GitHub Actions

PySpark

PostgreSQL

Hadoop

Databricks

Terraform

AWS Lambda

Airflow

s3-lambda

SQL

Delta Lake

Availability

Open to opportunities

Location

India

Authorized to work in

Job categories

Data Engineer ETL ELT Development Big Data Engineering Cloud Data Engineering Data Engineering Data Engineering Specialist Data Engineering Positions Data Data Engineering

Interested in hiring Roro?

You can contact Roro and 90k+ other talented remote workers on Himalayas.

People also viewed

View all talent

Get matched with your dream remote job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!