Ian Juch
@ianjuch
Staff AI Systems Engineer building production LLM platforms with reliability, cost, and latency tradeoffs.
What I'm looking for
I’m a Staff AI Systems Engineer with 10+ years building production LLM platforms and distributed systems at Adobe, Cohere, and Netflix. I focus on systems that work under real constraints—latency, cost, correctness, and reliability—across retrieval, agent orchestration, and evaluation pipelines.
At Adobe, I build backend orchestration powering Firefly generative workflows, designing multi-stage execution pipelines with explicit state tracking and context-grounding from user assets and project state. I also implement tool-execution layers for deterministic services, plus rollout and gating strategies balancing latency, cost, and content safety.
Earlier, at Cohere and Netflix, I built enterprise LLM and retrieval infrastructure and high-scale experimentation and traffic control systems. I take ownership end-to-end—making systems observable, debuggable, and reliable through request-level tracing, targeted failure-mode evaluation, and staged rollouts with kill switches and fallback paths.
Experience
Work history, roles, and key accomplishments
Built backend orchestration systems powering Firefly generative workflows, including multi-stage execution with explicit state tracking across retrieval, inference, and post-processing. Implemented tool-augmented execution, rollout gating, evaluation pipelines for failure modes, and request-level observability for end-to-end debugging of multi-step behavior.
Built a developer-facing platform for generation and embedding APIs, enabling enterprises to integrate LLM capabilities into production systems. Designed retrieval and orchestration pipelines (retrieve → rank → prompt → generate → validate) using hybrid retrieval and evaluation tooling to debug retrieval and output correctness.
Built backend services within Netflix Experimentation Platform to enable safe product evaluations across global traffic. Developed traffic routing, staged rollout mechanisms (progressive ramp-up, kill switches, fallbacks), dynamic experiment configuration APIs, and observability for detecting anomalies and regressions.
Built distributed services for Traffic Routing and Experimentation APIs, supporting high-throughput request routing across global regions. Implemented low-latency routing, production observability (metrics/logging/tracing), resilience mechanisms (retries, circuit breakers, graceful degradation), and on-call incident diagnostics for critical infrastructure.
Contributed backend code supporting Streaming Platform infrastructure with a focus on reliability and service coordination. Built internal tooling to improve deployment workflows and service monitoring, and improved logging/instrumentation to enhance debugging of distributed interactions.
Education
Degrees, certifications, and relevant coursework
University of California, Berkeley
Master of Science, Computer Science
2013 - 2015
Master of Science in Computer Science at the University of California, Berkeley from 2013 to 2015.
University of California, Berkeley
Bachelor of Science, Computer Science
2010 - 2013
Bachelor of Science in Computer Science at the University of California, Berkeley from 2010 to 2013.
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Ian?
You can contact Ian and 90k+ other talented remote workers on Himalayas.
Message IanFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
