Nathan Leung
@nathanleung
Staff Full-Stack ML Engineer specializing in GenAI platforms, MLOps, and scalable production AI systems.
What I'm looking for
I am a Staff Full-Stack ML Engineer with 14+ years building production systems at scale, from ML training and deployment infrastructure to GenAI platforms. I design and deliver end-to-end solutions including RAG pipelines, LLM fine-tuning, multi-tenant AI SaaS, and event-driven services on AWS.
At AllCloud I architected a multi-tenant GenAI platform using Bedrock, OpenSearch Serverless, EKS, and Terraform, and led fine-tuning and deployment of custom models. Previously at Twitch and Amazon I built large-scale ML training pipelines, model registries, feature pipelines, low-latency model serving, and production ad and payment systems.
I bring strong hands-on expertise across cloud infrastructure, Kubernetes, IaC, SageMaker, PyTorch/TensorFlow, and distributed systems, paired with leadership in architecture, GitOps-driven delivery, and enabling data scientists to self-serve ML workflows.
Experience
Work history, roles, and key accomplishments
Staff Full-Stack ML Engineer
AllCloud
Sep 2023 - Present (2 years 5 months)
Architected a multi-tenant GenAI SaaS platform and end-to-end RAG pipelines using Amazon Bedrock and OpenSearch Serverless, enabling contextual recommendations and tenant-isolated scalable inference. Delivered fine-tuned LLMs, Terraform IaC, and GitOps CI/CD to production, reducing operational complexity and standardizing LLM integrations.
Built end-to-end SageMaker training and deployment pipelines, model registry, and feature pipelines ingesting billions of events, reducing model deployment time from days to hours and enabling reproducible, scalable ad-serving ML systems. Implemented real-time serving, canary rollouts, and A/B testing for production models.
Designed and launched interactive ad formats and the Bounty Board marketplace using Go microservices and DynamoDB, increasing ad engagement and enabling thousands of brand-streamer partnerships; led migrations from monolith to microservices and built A/B experimentation infrastructure.
Architected client-side and backend flows for Amazon Appstore IAP, built resilient transaction handling and server-side receipt verification, and developed search/ranking and promotional systems that supported large-scale app discovery and high-volume downloads.
Built automated testing and load-testing frameworks for the Amazon Appstore backend, enabling CI/regression testing and validating system capacity prior to public launch. Produced realistic test data generation utilities to support integration testing.
Education
Degrees, certifications, and relevant coursework
University of Waterloo
Bachelor of Computer Science, Computer Science
2006 - 2011
Completed a Bachelor of Computer Science program focusing on software engineering and systems between 2006 and 2011.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Website
nleung.meJob categories
Skills
Interested in hiring Nathan?
You can contact Nathan and 90k+ other talented remote workers on Himalayas.
Message NathanFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
