Yaseen Banu
@yaseenbanu
Senior Data Engineer specializing in scalable cloud data platforms and PySpark.
What I'm looking for
I am a Senior Data Engineer with six years of experience designing, building, and optimizing large-scale data pipelines across Azure, GCP, and AWS. I specialize in PySpark, Python, and SQL and focus on delivering scalable, cost-efficient data solutions.
I have architected reusable ingestion frameworks, migrated platforms across clouds, and converted legacy pipelines to modern, standardized components that cut development time and technical debt. My work has driven measurable cost savings, performance improvements, and higher data quality.
I apply Gen AI concepts to practical solutions—building RAG systems, autonomous agents, and LLM orchestration with tools such as LangChain and LangGraph—to enhance analytics and operational workflows. I also lead efforts in orchestration, monitoring, and governance to ensure reliable production platforms.
I hold multiple cloud and data engineering certifications, have been recognized with internal awards for technical excellence, and enjoy collaborating with cross-functional teams to translate business requirements into robust data products.
Experience
Work history, roles, and key accomplishments
Architected reusable Dataflow ingestion framework processing 500M+ daily records into Snowflake, reducing extraction time by 55% and adopted by 6+ teams; converted Java pipelines to Python reducing new pipeline development time by 40% and implemented config-driven validation to eliminate manual checks.
Built Kafka Streams applications with stateful processing, deduplication and exactly-once semantics, and implemented real-time enrichment pipelines with DLQ handling and fault recovery patterns for transaction data processing.
Senior Data Engineer
Dec 2021 - Feb 2024 (2 years 2 months)
Architected ML data platform integrating 20+ sources into Snowflake using medallion architecture and led Azure-to-GCP migration adapting 50+ PySpark jobs, cutting idle compute costs by 40% and reducing processing time by 30% via ingestion framework optimizations.
Migrated 30+ tables (2TB+) from HDFS to Kudu using PySpark reducing query latency by 60%, developed Airflow monitoring with Slack alerts to cut incident detection time and built PySpark validation framework processing 5M+ records daily.
Education
Degrees, certifications, and relevant coursework
Sree Vidyanikethan Engineering College
Bachelor of Technology, Computer Science and Engineering
2015 - 2019
Completed a Bachelor of Technology in Computer Science and Engineering with coursework and projects focused on software engineering and data processing.
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Yaseen?
You can contact Yaseen and 90k+ other talented remote workers on Himalayas.
Message YaseenFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
