Open to opportunities

Kevin Wang

@kevinwang8

Senior software engineer delivering scalable AI inference platforms, distributed systems, and RAG-powered experiences at global scale.

United States

Message

What I'm looking for

I want to build high-availability AI infrastructure and production LLM platforms—designing distributed systems, optimizing inference latency, and mentoring teams to ship secure, measurable features across global cloud environments.

I’m a Senior Software Engineer with 10+ years of experience building scalable backend systems, AI infrastructure, and production LLM platforms. I focus on distributed systems, Kubernetes-based ML serving, and generative AI integration to deliver real product impact.

At Google Workspace AI Core Platform, I architected and built an AI Inference Orchestration Service (Golang) to route requests between Workspace surfaces and ML models. I improved inference throughput by optimizing Golang request batching and adding in-memory caching, reducing P95 latency by 30%, and I designed routing and caching that sustained 99.99% availability across 5 global regions.

I’ve built the end-to-end experience layer as well: creating a Node.js gateway on Cloud Run to aggregate responses for AI prompt construction, and delivering UI capabilities with React.js and Next.js, including a control plane for model configuration, A/B testing, and real-time performance monitoring. I enabled low-latency interactions using SSR, streaming responses, and Next.js API routes.

I also lead secure, observable ML infrastructure—scaling distributed ML inference on GKE with Cloud Spanner and Pub/Sub, implementing mTLS, network policies, and VPC Service Controls, and integrating RAG components to improve contextual grounding and reduce hallucinations. I mentor 5 engineers, driving 3 promotions and supporting 2 successful transitions into ML engineering teams.

Experience

Work history, roles, and key accomplishments

Current

Senior Software Engineer

Current

Google

Oct 2017 - Present (8 years 9 months)

Architected and built an AI inference orchestration service for Google Workspace (Gmail/Chat), improving inference throughput and reducing P95 latency by 30% via Go batching and in-memory caching. Designed ML inference routing on GKE across 5 global regions with 99.99% availability, implemented RAG with semantic embeddings to reduce hallucinations, and integrated Vertex AI/Gemini for AI-assisted d