Afroj Shaikh

AS

Looking for a job

Afroj Shaikh

@afrojshaikh

Data Engineer with 4+ years of experience designing scalable data pipelines using PySpark, Databricks, Python, SQL, AWS, and DBT.

What I'm looking for

I’m looking for a Data Engineering role where I can own PySpark/Databricks ETL modernization, drive Spark performance tuning, and build resilient, well-monitored AWS workflows for large-scale data platforms.

I’m a Data Engineer with 4 years of experience designing and optimizing scalable data pipelines using PySpark, Databricks, AWS, and the Hadoop ecosystem. I focus on ETL modernization, distributed data processing, and building cloud-native solutions with strong performance, monitoring, and reliability.

In my current role at Barclays, I developed scalable PySpark pipelines for enterprise banking control frameworks and automated migration of legacy SQL workflows into standardized PySpark frameworks, reducing manual conversion effort by 70%. I’ve also built reusable ETL framework components for logging, auditing, exception handling, and control validation.

I bring practical engineering rigor from Capgemini, where I optimized PySpark ETL pipelines on AWS EMR and Hadoop, reducing resource utilization by 50% and improving pipeline performance. I design orchestration with Apache Airflow, improve Hive query performance by 35%, and strengthen production resilience using AWS Fault Injection Simulator—supporting releases with zero critical production incidents.

Experience

Work history, roles, and key accomplishments

BA

Current

Data Engineer

Current

Barclays

Sep 2025 - Present (10 months)

Developed scalable PySpark ETL pipelines for enterprise banking control frameworks and optimized Spark workloads and partition strategies. Automated migration of legacy SQL workflows into standardized PySpark frameworks, reducing manual conversion effort by 70%, and implemented AWS-based orchestration and monitoring with centralized CloudWatch logging.

PySpark Apache Spark Databricks AWS Lambda AWS Glue Step Functions IAM Delta Lake SQL Python EMR Kinesis Fault Injection Simulator (FIS)Logging Data Validation Partitioning S3 Cloudwatch Audit

CA

Data Engineer

Capgemini

Dec 2022 - Sep 2025 (2 years 9 months)

Developed and optimized PySpark-based ETL pipelines on AWS EMR and Hadoop for large-scale distributed processing. Implemented Airflow scheduling, migrated datasets to an S3-based data lake using AWS Glue, and improved Spark and Hive performance (resource utilization by 50% and Hive query performance by 35%).

Airflow PySpark Apache Spark Hadoop Hive AWS Glue SQL Python Terraform GitHub Actions Jenkins CI CD Redshift Snowflake Partitioning Bucketing Shell Scripting S3

Education

Degrees, certifications, and relevant coursework

SU

Savitribai Phule Pune University

Bachelor of Engineering, Electronics and Telecommunication Engineering

2019 - 2022

Bachelor of Engineering in Electronics and Telecommunication from Savitribai Phule Pune University (Aug 2019 to May 2022).

Tech stack

Software and tools used professionally

Snowflake

Apache Spark

AWS Glue

AWS Step Functions

GitHub

GitLab

Jenkins

GitHub Actions

PySpark

dbt

PostgreSQL

Hadoop

Databricks

Terraform

Python

AWS Lambda

Docker

Airflow

s3-lambda

SQL

Delta Lake

Availability

Looking for a job

Location

India

Authorized to work in

Social media

Job categories

Data Engineer Data Engineer ETL Cloud Data Engineering Spark Optimization Workflow Orchestration Data Engineering Data Engineering Specialist Data Engineering Positions Cloud Data Engineer

Interested in hiring Afroj?

You can contact Afroj and 90k+ other talented remote workers on Himalayas.

People also viewed

View all talent

Get matched with your dream remote job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!