Evan Thomas
@evanthomas1
Senior AI/ML engineer specializing in production LLM systems, observability, and reliability.
What I'm looking for
I am a Senior AI/ML Engineer with 11+ years of experience and 6+ years building and operating production ML and LLM systems, owning end-to-end architecture, launch, scaling, on-call operations, and multi-year iteration informed by real-world usage.
I specialize in LLM observability, evaluation, agent reliability, and inference infrastructure, with hands-on experience designing tracing and evaluation pipelines, human-in-the-loop labeling workflows, drift detection and alerting, autoscaling GPU-backed inference, incident response, and partnering with product and enterprise customers to translate quality issues into measurable system improvements.
Experience
Work history, roles, and key accomplishments
Senior AI Engineer
Arize AI
Jan 2024 - Jan 2026 (2 years)
Owned core components of a production LLM observability platform ingesting millions of traces/day; designed tracing, offline and online evaluation pipelines, and human-in-the-loop workflows that reduced undetected regressions and improved issue resolution. Served on-call for ingestion, cost, and evaluation incidents and partnered with customers to translate quality complaints into measurable syste
Senior AI Engineer
Galileo AI
May 2021 - Jan 2024 (2 years 8 months)
Built observability tooling for LLM-powered agents capturing multi-step traces and implemented failure taxonomies, adaptive sampling, and guardrails that produced double-digit improvements in task completion and reduced high-severity failures. Created dashboards and analysis workflows to surface recurring production failure patterns.
Senior Machine Learning Engineer
Baseten
Mar 2018 - May 2021 (3 years 2 months)
Designed and optimized GPU-backed real-time and batch inference pipelines to meet strict p95 latency targets, implemented autoscaling strategies and observability for latency and resource utilization, and participated in incident response for inference outages and performance regressions.
Built and maintained large-scale backend services and user-facing features in high-traffic production systems, participated in on-call rotations and postmortems, and contributed to long-term reliability and distributed systems practices used by millions of users.
Education
Degrees, certifications, and relevant coursework
Massachusetts Institute of Technology
Bachelor of Science, Computer Science
2010 - 2014
Completed a Bachelor of Science in Computer Science with coursework and projects focused on systems and algorithms.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Evan?
You can contact Evan and 90k+ other talented remote workers on Himalayas.
Message EvanFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
