HimalayasHimalayas logo
MK
Open to opportunities

Michael Kim

@michaelkim1

Senior software engineer building scalable AI/ML and LLM backends with low-latency, cloud-native MLOps.

United States
Message

What I'm looking for

I’m looking to build production-grade AI/ML and LLM backend systems—real-time inference, LLM observability, and multi-tenant MLOps—that deliver low-latency reliability and reduce operational burden for engineering and SRE teams.

I’m a Senior Software Engineer with 12+ years of experience building and scaling AI/ML systems at Datadog, Salesforce, and Amazon. I specialize in end-to-end ML platform engineering, from real-time inference and LLM observability to MLOps pipelines and multi-tenant AIOps.

I’m known for translating cutting-edge AI capabilities into reliable, low-latency production infrastructure that directly reduces operational burden for engineering and SRE teams. Across enterprise-scale, cloud-native environments, I’ve delivered systems that process trillions of data points daily.

At Datadog, I led backend architecture for AIOps and LLM Observability platforms that power real-time anomaly detection, AI-driven root cause analysis, and generative AI monitoring across thousands of customers. I architected AIOps backend processing trillions of points daily for Watchdog (sub-second latency), engineered event-driven pipelines for Bits AI (reducing customer MTTR by 20% via LLM-driven remediation), and built telemetry to monitor token usage and cost for OpenAI and Anthropic.

Before that, at Salesforce, I built the core MLOps and multi-tenant infrastructure behind Einstein AI, training over 900,000 customer-specific ML models per hour. I also engineered scalable workflows for the ML Lake on AWS S3 and Apache Iceberg, designed distributed scheduling with AWS Lambda to orchestrate hundreds of thousands of parallel training jobs, and optimized real-time prediction serving infrastructure for low-latency outcomes inside Salesforce applications.

Experience

Work history, roles, and key accomplishments

DA
Current

Senior Software Engineer

Datadog

Sep 2020 - Present (5 years 7 months)

Led backend architecture and development of Datadog’s AIOps and LLM Observability platforms, enabling sub-second unsupervised anomaly detection and generative AI monitoring across thousands of enterprise customers. Built event-driven LLM remediation pipelines that reduced customer MTTR by 20% and delivered zero-downtime A/B testing for new detection algorithms.

SA

Senior Member of Technical Staff

Salesforce

Apr 2015 - Aug 2020 (5 years 4 months)

Built core MLOps and multi-tenant infrastructure for Salesforce Einstein AI, supporting training of 900,000+ customer-specific ML models per hour at enterprise scale. Developed secure LLM grounding in CRM data and scalable ML workflows using AWS S3, Apache Iceberg, and a Lambda-based distributed scheduler.

AS

Software Development Engineer

Amazon Web Services

Sep 2012 - Jan 2015 (2 years 4 months)

Developed backend components for Amazon Fraud Detector, delivering fraud risk scoring in milliseconds for high-volume enterprise e-commerce and financial platforms. Built scalable microservices and ingestion pipelines with AWS Kinesis and S3, optimizing low-latency inference APIs and end-to-end ML lifecycle workflows.

Education

Degrees, certifications, and relevant coursework

University of Virginia logoUV

University of Virginia

Bachelor of Science, Computer Science

Earned a Bachelor of Science in Computer Science from the University of Virginia in 2012.

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan