Open to opportunities

Samuel Shrestha

@samuelshrestha

Message

Senior Data Engineer building scalable batch and streaming platforms for AI, analytics, and cloud ecosystems.

United States

Message

What I'm looking for

I’m looking for a senior data engineering role to build resilient batch/streaming pipelines for AI and analytics—especially privacy-safe clean rooms, vector search/RAG, and self-serve data products—while mentoring teams.

I’m a Senior Data Engineer with 8+ years building scalable batch and streaming data platforms across AI, analytics, and cloud ecosystems. I bring deep expertise in distributed systems, data modeling, and modern ETL architectures, delivering large-scale solutions using Python, SQL, Spark, Kafka, Airflow, dbt, Snowflake, and AWS.

At Salesforce, I delivered Data 360 capabilities spanning AI retrieval, zero-copy clean rooms, vector search, and personalization—building JSON intent contracts, embeddings, model-serving APIs, and Spark Streaming pipelines. At Instacart and Komodo Health, I modernized platforms with dbt and Airflow orchestration, tuned Snowflake and Spark for reliability and cost observability, and built healthcare ETL workflows that processed vast datasets; I’m motivated by privacy-safe, observable pipelines and clear collaboration across product, ML, security, and infrastructure teams.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Salesforce

Mar 2023 - Present (3 years 4 months)

Delivered Salesforce Data 360 capabilities for AI retrieval, zero-copy clean rooms, vector search, and personalization using Python, SQL, Spark, Kafka, and AWS, enabling sub-second personalization at enterprise scale. Built SQL-validation and lineage-based collaboration components, improved connector coverage beyond 100 with 4x throughput, and reduced dialect delivery time from 40 to 10 days.

Python SQL Pyspark Apache Spark Apache Kafka Airflow DBT Snowflake AWS Data Lineage

Data Engineer

Instacart

Jan 2020 - Feb 2023 (3 years 1 month)

Modernized Instacart’s data platform by building self-serve batch and streaming pipelines with Snowflake, dbt, Airflow, Kafka, Flink, and Kubernetes, improving reliability across 10+PB and 5M+ tables. Migrated legacy transformations to modular dbt models, scaled orchestration toward 400 DAGs/5,000 tasks, and reduced cold-start waste costs by 20–40%.

Python SQL Snowflake DBT Apache Airflow Apache Kafka Apache Flink Kubernetes AWS Terraform

Data Engineer

Komodo Health

Jun 2018 - Dec 2019 (1 year 6 months)

Built Healthcare Map data pipelines processing 150+ datasets for 300M+ patients and 50B+ encounters, supporting analytics across batch ETL and incremental backfills. Developed data quality checks and improved Spark pipeline performance (joins, partitions, pruning, retries), delivering 15M+ daily clinical encounters to downstream analytics.