Open to opportunities

Sabina Gurung

@sabinagurung

Senior Data Engineer building cloud-native lakehouse platforms and AI-enabled data pipelines that drive scalable, reliable analytics.

United States

Message

What I'm looking for

I’m looking for a senior data engineering role where I can build cloud-native lakehouse platforms, deliver secure batch/streaming pipelines, and apply Data Quality/Data Observability plus GenAI (RAG, semantic/vector search) to improve reliability and accelerate analytics.

I’m a Senior Data Engineer with 7+ years of experience designing, developing, and optimizing enterprise-scale data platforms across insurance, healthcare, retail, and financial services. I build cloud-native solutions across Microsoft Azure, Google Cloud Platform (GCP), and AWS, with deep hands-on expertise in Databricks, Apache Spark, PySpark, Snowflake, BigQuery, Microsoft Fabric, and modern Lakehouse architectures.

In my recent role, I architected a GCP-native enterprise data platform, migrated 50+ legacy AWS ETL pipelines (reducing infrastructure costs by 25%), and delivered scalable ELT pipelines processing 8TB+ daily (reducing processing times by 35%). I also implement Data Quality Frameworks and Data Observability to improve accuracy and cut incident detection/resolution time by 35%, and I integrate Generative AI-enabled workflows using Azure OpenAI, Vertex AI, and RAG for automation that reduces manual effort by 50%+.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Metlife

Aug 2024 - Present (1 year 11 months)

Architected a GCP-native enterprise data platform with BigQuery, Dataflow, Dataproc, Pub/Sub, and Cloud Storage, supporting insurance policy, claims, underwriting, and actuarial analytics. Migrated 50+ legacy AWS ETL pipelines to GCP, cutting infrastructure costs by 25%, reducing daily processing time by 35%, and improving data accuracy by 40% through data quality and observability initiatives.

Google Cloud Dataflow Google Cloud Dataproc Pub Sub Lakehouse Architecture Apache Spark Data Quality Data Observability Vertex AI BigQuery RAG

Data Engineer

Johnson & Johnson

Sep 2022 - Jul 2024 (1 year 10 months)

Designed and built Azure-based batch and near real-time data pipelines for clinical research, patient safety, manufacturing, and commercial datasets. Reduced duplicate processing efforts via a Medallion Lake architecture and improved performance by 40% while cutting manual orchestration work by 50%, with automated data quality and observability reducing production incidents by 30%.

Azure Databricks Azure Data Factory PySpark Azure Event Hubs Medallion Architecture Snowflake Kafka

ETL Developer

Walmart

Nov 2021 - Aug 2022 (9 months)

Developed and maintained enterprise ETL pipelines with Informatica PowerCenter to integrate high-volume retail transaction, inventory, merchandising, and supply chain data. Improved batch runtimes by 40% through SQL performance tuning and reduced manual intervention by 30% using Unix/Linux shell automation, while enhancing reliability via workflow recovery and production support.

Informatica SQL Tuning Batch Processing Production Support Data Warehouse Pl SQL

Backend Developer

Bank of America

Jul 2019 - Oct 2021 (2 years 3 months)

Built and maintained backend RESTful APIs using Python and Flask for customer account management, transaction processing, and internal banking applications. Improved data accuracy by 25% through validation/reconciliation modules and increased API response times by 35% via database query optimization, while supporting reliable deployments through CI/CD with Jenkins and Git.

Python Flask Data Validation REST APIs