Open to opportunities

Yaseen Banu

@yaseenbanu

Message

Senior Data Engineer specializing in scalable cloud data platforms and PySpark.

India

Message

What I'm looking for

I seek senior roles building scalable, cloud-native data platforms where I can lead architecture, optimize costs, and implement reliable, governed data workflows.

I am a Senior Data Engineer with six years of experience designing, building, and optimizing large-scale data pipelines across Azure, GCP, and AWS. I specialize in PySpark, Python, and SQL and focus on delivering scalable, cost-efficient data solutions.

I have architected reusable ingestion frameworks, migrated platforms across clouds, and converted legacy pipelines to modern, standardized components that cut development time and technical debt. My work has driven measurable cost savings, performance improvements, and higher data quality.

I apply Gen AI concepts to practical solutions—building RAG systems, autonomous agents, and LLM orchestration with tools such as LangChain and LangGraph—to enhance analytics and operational workflows. I also lead efforts in orchestration, monitoring, and governance to ensure reliable production platforms.

I hold multiple cloud and data engineering certifications, have been recognized with internal awards for technical excellence, and enjoy collaborating with cross-functional teams to translate business requirements into robust data products.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

EPAM Systems India Pvt Ltd

Aug 2024 - Present (1 year 11 months)

Architected reusable Dataflow ingestion framework processing 500M+ daily records into Snowflake, reducing extraction time by 55% and adopted by 6+ teams; converted Java pipelines to Python reducing new pipeline development time by 40% and implemented config-driven validation to eliminate manual checks.

Pyspark Python BigQuery Snowflake Dataflow Terraform Data Ingestion Data Quality

Senior Data Engineer

Publicis Sapient

Feb 2024 - Aug 2024 (6 months)

Built Kafka Streams applications with stateful processing, deduplication and exactly-once semantics, and implemented real-time enrichment pipelines with DLQ handling and fault recovery patterns for transaction data processing.

Kafka Streams Kafka Java Fault Tolerance Stream Enrichment

Senior Data Engineer

Deloitte Touche Consulting Private Limited

Dec 2021 - Feb 2024 (2 years 2 months)

Architected ML data platform integrating 20+ sources into Snowflake using medallion architecture and led Azure-to-GCP migration adapting 50+ PySpark jobs, cutting idle compute costs by 40% and reducing processing time by 30% via ingestion framework optimizations.

BigQuery Pyspark Airflow Snowflake Data Migration Orchestration

Data Engineer

Cognizant Technology Limited

Jul 2019 - Dec 2021 (2 years 5 months)

Migrated 30+ tables (2TB+) from HDFS to Kudu using PySpark reducing query latency by 60%, developed Airflow monitoring with Slack alerts to cut incident detection time and built PySpark validation framework processing 5M+ records daily.

Pyspark Airflow HDFS Data Validation Monitoring

Education

Degrees, certifications, and relevant coursework

Sree Vidyanikethan Engineering College

Bachelor of Technology, Computer Science and Engineering

2015 - 2019

Completed a Bachelor of Technology in Computer Science and Engineering with coursework and projects focused on software engineering and data processing.