Skip to main content
Afroj ShaikhAS
Looking for a job

Afroj Shaikh

@afrojshaikh

Data Engineer with 4+ years of experience designing scalable data pipelines using PySpark, Databricks, Python, SQL, AWS, and DBT.

India
Message

What I'm looking for

I’m looking for a Data Engineering role where I can own PySpark/Databricks ETL modernization, drive Spark performance tuning, and build resilient, well-monitored AWS workflows for large-scale data platforms.

I’m a Data Engineer with 4 years of experience designing and optimizing scalable data pipelines using PySpark, Databricks, AWS, and the Hadoop ecosystem. I focus on ETL modernization, distributed data processing, and building cloud-native solutions with strong performance, monitoring, and reliability.

In my current role at Barclays, I developed scalable PySpark pipelines for enterprise banking control frameworks and automated migration of legacy SQL workflows into standardized PySpark frameworks, reducing manual conversion effort by 70%. I’ve also built reusable ETL framework components for logging, auditing, exception handling, and control validation.

I bring practical engineering rigor from Capgemini, where I optimized PySpark ETL pipelines on AWS EMR and Hadoop, reducing resource utilization by 50% and improving pipeline performance. I design orchestration with Apache Airflow, improve Hive query performance by 35%, and strengthen production resilience using AWS Fault Injection Simulator—supporting releases with zero critical production incidents.

Experience

Work history, roles, and key accomplishments

BA
Current

Data Engineer

Barclays

Sep 2025 - Present (10 months)

Developed scalable PySpark ETL pipelines for enterprise banking control frameworks and optimized Spark workloads and partition strategies. Automated migration of legacy SQL workflows into standardized PySpark frameworks, reducing manual conversion effort by 70%, and implemented AWS-based orchestration and monitoring with centralized CloudWatch logging.

CA

Data Engineer

Capgemini

Dec 2022 - Sep 2025 (2 years 9 months)

Developed and optimized PySpark-based ETL pipelines on AWS EMR and Hadoop for large-scale distributed processing. Implemented Airflow scheduling, migrated datasets to an S3-based data lake using AWS Glue, and improved Spark and Hive performance (resource utilization by 50% and Hive query performance by 35%).

Education

Degrees, certifications, and relevant coursework

Savitribai Phule Pune University logoSU

Savitribai Phule Pune University

Bachelor of Engineering, Electronics and Telecommunication Engineering

2019 - 2022

Bachelor of Engineering in Electronics and Telecommunication from Savitribai Phule Pune University (Aug 2019 to May 2022).

Get matched with your dream remote job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan