Skip to main content
HimalayasHimalayas logo
SX
Open to opportunities

Suki Xiao

@suqixiao

Staff AI software engineer building scalable LLM/RAG cloud platforms and improving cost, latency, and reliability.

United States
Message

What I'm looking for

I’m looking to lead the engineering of scalable LLM/RAG platforms on cloud-native infrastructure—owning latency, reliability, and cost—while partnering cross-functionally and mentoring teams to ship production AI faster and safer.

I’m a Staff Software Engineer with 10+ years of experience delivering production-grade AI systems and cloud-native platforms. I build scalable distributed systems and AI-powered products that translate complex requirements into reliable, measurable outcomes—performance, cost, and speed to ship.

At AWS, I developed a multi-tenant LLM orchestration platform powering enterprise AI copilots for 1K+ concurrent users with sub-300ms response latency. I designed and owned end-to-end RAG (OpenSearch + FAISS), improving answer accuracy by 35% and reducing hallucinations, while optimizing the LLM request lifecycle to cut token usage costs by 28% and increase throughput under peak load.

I also led platform observability and reliability improvements using OpenTelemetry and CloudWatch, reducing MTTR by 40%, and established secure multi-tenant isolation patterns (IAM, KMS, role-based access) for enterprise governance. Earlier at Google, I engineered high-throughput pipelines for large-scale ML training and personalization, modernized frontend architecture with React and TypeScript, and applied ML-based anomaly detection to reduce production incidents by 25%—all while mentoring teams on distributed systems and AI system design.

Experience

Work history, roles, and key accomplishments

AW
Current

Staff Software Engineer

Jan 2023 - Present (3 years 4 months)

Developed a multi-tenant LLM orchestration platform for enterprise AI copilots, supporting 1K+ concurrent users with sub-300ms response latency. Designed and operated end-to-end RAG (OpenSearch + FAISS) to improve answer accuracy by 35%, while reducing token costs by 28% through caching, prompt deduplication, and async batching.

Google logoGO

Senior Software Engineer

May 2016 - Jan 2023 (6 years 8 months)

Built backend and system design for a large-scale intelligent search platform, improving search relevance and user engagement across multi-billion record datasets. Engineered data pipelines (Kafka + Apache Beam) and low-latency distributed services (Java/Go) for real-time recommendations, and reduced production incidents by 25% with ML-based anomaly detection and proactive monitoring.

UC Berkeley logoUB

Research Assistant

Aug 2012 - Apr 2016 (3 years 8 months)

Designed and implemented machine learning and optimization models, improving computational efficiency by ~20% for large-scale forecasting and decision systems. Built reusable ETL pipelines and applied statistical modeling and simulation to evaluate performance under uncertainty and varying constraints.

Education

Degrees, certifications, and relevant coursework

UC Berkeley logoUB

UC Berkeley

Master of Science, Industrial engineering

2012 - 2013

Completed an M.S. in Industrial engineering at UC Berkeley from 2012 to 2013.

Beijing University of Posts and Telecommunications logoBT

Beijing University of Posts and Telecommunications

Bachelor of Science, Telecommunication Engineering and Management

2008 - 2012

Earned a B.S. in telecommunication engineering and management from 2008 to 2012.

Queen Mary University of London logoQL

Queen Mary University of London

Bachelor of Science, Telecommunication Engineering and Management

2008 - 2012

Earned a B.S. in telecommunication engineering and management from 2008 to 2012.

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan