Samuel Shrestha
@samuelshrestha
Senior Data Engineer building scalable batch and streaming platforms for AI, analytics, and cloud ecosystems.
What I'm looking for
I’m a Senior Data Engineer with 8+ years building scalable batch and streaming data platforms across AI, analytics, and cloud ecosystems. I bring deep expertise in distributed systems, data modeling, and modern ETL architectures, delivering large-scale solutions using Python, SQL, Spark, Kafka, Airflow, dbt, Snowflake, and AWS.
At Salesforce, I delivered Data 360 capabilities spanning AI retrieval, zero-copy clean rooms, vector search, and personalization—building JSON intent contracts, embeddings, model-serving APIs, and Spark Streaming pipelines. At Instacart and Komodo Health, I modernized platforms with dbt and Airflow orchestration, tuned Snowflake and Spark for reliability and cost observability, and built healthcare ETL workflows that processed vast datasets; I’m motivated by privacy-safe, observable pipelines and clear collaboration across product, ML, security, and infrastructure teams.
Experience
Work history, roles, and key accomplishments
Delivered Salesforce Data 360 capabilities for AI retrieval, zero-copy clean rooms, vector search, and personalization using Python, SQL, Spark, Kafka, and AWS, enabling sub-second personalization at enterprise scale. Built SQL-validation and lineage-based collaboration components, improved connector coverage beyond 100 with 4x throughput, and reduced dialect delivery time from 40 to 10 days.
Modernized Instacart’s data platform by building self-serve batch and streaming pipelines with Snowflake, dbt, Airflow, Kafka, Flink, and Kubernetes, improving reliability across 10+PB and 5M+ tables. Migrated legacy transformations to modular dbt models, scaled orchestration toward 400 DAGs/5,000 tasks, and reduced cold-start waste costs by 20–40%.
Built Healthcare Map data pipelines processing 150+ datasets for 300M+ patients and 50B+ encounters, supporting analytics across batch ETL and incremental backfills. Developed data quality checks and improved Spark pipeline performance (joins, partitions, pruning, retries), delivering 15M+ daily clinical encounters to downstream analytics.
Education
Degrees, certifications, and relevant coursework
University of California, Berkeley
Bachelor of Science, Electrical Engineering and Computer Sciences
2014 - 2018
Bachelor of Science in Electrical Engineering and Computer Sciences from 2014 to 2018.
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Samuel?
You can contact Samuel and 90k+ other talented remote workers on Himalayas.
Message SamuelFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
