HimalayasHimalayas logo
NirYuNI

Senior AI Platform Engineer – HexCore & Eval Systems

Nir-Yu offers tailored nearshore staffing solutions that enable SMEs to hire skilled professionals from Latin America affordably and efficiently.

NirYu

Employee count: 201-500

United States only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

The Role:

We are seeking a Senior AI Platform Engineer to own the core platform layer that powers every AI agent in production — from multi-tenant agent configuration and schema architecture, to data pipeline contracts, evaluation harnesses, and customer onboarding automation.

This role sits at the intersection of backend platform engineering, LangGraph-based orchestration, and AI evaluation systems. You won't just build features — you'll own the infrastructure that makes all features possible: the agent orchestration graph, the customer configuration schema, end-to-end conversation logging, automated eval pipelines, and the scripts that deploy new customers in under 30 minutes.

If you love owning systems that other engineers depend on, ship at high velocity across a wide surface area, and take pride in leaving codebases cleaner than you found them — we want to hear from you.

Responsibilities:

  • Core Platform & Schema Architecture

    • Own and evolve the core platform repository — the central Python package implementing our modular agent architecture across orchestration, tools, state, retrieval, configuration, and extensibility layers.

    • Design and maintain customer configuration schemas including versioning metadata, lineage tracking, and component provenance fields aligned with our IP strategy.

    • Implement backward-compatible schema extensions and ensure all active customer deployments upgrade without breaking changes.

    • Enforce schema validation at all node inputs/outputs to prevent data drift across multi-tenant environments.

  • Multi-Tenancy Architecture

    • Build and maintain cross-client isolation across customer configuration, persistent state, and RAG pipelines.

    • Implement multi-tenant tagging so conversation logs, eval datasets, and agent behaviors remain cleanly separated per customer.

    • Design config-driven deploy parameterization to enable new customer onboarding without code changes — configuration-only deployment model.

    • Ensure all platform changes are backward compatible — no code forking per customer.

  • Data Pipelines & Conversation Logging

    • Own the end-to-end conversation logging system — unified schema, row format, conversation capture, and metadata persistence to PostgreSQL and S3.

    • Maintain and extend knowledge base ingestion pipelines: scraping, embedding, vector DB indexing, and retrieval validation for each customer deployment.

    • Define and freeze data contracts between capture specifications and implementation — so downstream analytics, fine-tuning, and eval all receive consistent, well-structured inputs.

    • Implement multi-tenant data tagging so every logged conversation is attributed to the correct customer, facility, and session.

  • Eval Systems & Quality Gates

    • Own the eval suite end-to-end: scenario design, ground-truth dataset curation, automated scoring (F1, precision, recall), and regression CI gates.

    • Build and maintain LLM simulation test flows — parameterized test scenarios that exercise the agent across reservations, pricing, sizing, escalation, and context retention.

    • Instrument distributed tracing at the LangGraph node level — capturing token usage, latency per node, and score drift across deployments.

    • Implement eval suite parameterization so the same harness works across all customers with minimal configuration.

    • Define and enforce production-ready gates — eval score thresholds that must be met before any agent goes live.

  • Onboarding Automation & Deployment

    • Build and maintain onboarding automation scripts that deploy a new customer in under 30 minutes: configuration templates, KB ingestion, eval suite setup, and run scripts.

    • Own deploy parameterization — all customer-specific values injected via config, never hardcoded.

    • Maintain platform sync across customer repositories — keeping shared platform code consistent without breaking customer-specific deployments.

    • Document and enforce the deployment SOP so any engineer can execute a new deployment without escalation.

  • Reliability & Observability

    • Ensure all platform APIs meet latency targets (P95 < 1.5s for voice path) through profiling, caching, and async optimization.

    • Maintain structured logging at every critical path node — conversation start/end, intent classification, retrieval hits, booking outcomes.

    • Implement CI/CD gates that run eval and schema validation automatically before any merge to the production branch.

    • Contribute to incident diagnosis by maintaining observable, well-logged systems with clear error paths.

Requirements:

Experience:

  • Must: Proficient or Advance use of agentic workflows for coding in tools like Cursor AI or Claude Code.

  • 4+ years building and owning production-grade backend systems in Python.

  • Proven experience owning a core platform or shared infrastructure layer used by multiple teams or customers.

  • Hands-on track record with multi-tenant system design — schema isolation, config-driven parameterization, and deployment automation.

  • Experience building evaluation harnesses for LLM-based systems with quantitative metrics.

Tools / Technologies:

  • Python (advanced): async I/O, FastAPI, Pydantic, pytest, type hinting, data classes.

  • LangGraph: state machines, conditional edges, node composition, shared state management across modular agent layers.

  • PostgreSQL + pgvector: relational schema design, state persistence, multi-tenant data isolation.

  • RAG pipelines: vector DB (Pinecone or equivalent), embedding pipelines, retrieval evaluation.

  • Eval & tracing frameworks: LLM simulation testing, distributed tracing, automated scoring pipelines.

  • GitHub Actions / CI/CD: automated eval gates, schema validation hooks, environment promotion.

  • AWS: EC2, S3, RDS, IAM — production deployment and infrastructure operations.

  • YAML / config-driven deployment: customer configuration templating, parameterized onboarding scripts.

Skills:

  • Strong systems thinking — ability to see how schema decisions in the core platform ripple downstream to eval, logging, onboarding, and customer deployments.

  • Comfort owning wide surface area — this role crosses platform, data, eval, and ops without a narrow specialization.

  • High individual shipping velocity — ability to close multiple GitHub issues per day with clean PRs and minimal back-and-forth.

  • Strong schema discipline — treats data contracts as first-class artifacts, not afterthoughts.

  • Ability to work autonomously with minimal supervision in a fast-moving startup environment.

  • Strong written communication for PR descriptions, Notion documentation, and deployment SOPs.

Preferred Qualifications:

  • Experience with IP-aware architecture decisions or contributing to software patent documentation.

  • Familiarity with voice agent systems (Twilio, PSTN, LiveKit) and latency-constrained deployments.

  • Experience with multi-model evaluation (comparing models from OpenAI, Anthropic, Mistral) using quantitative benchmarks.

  • Prior work in self-storage, property management, or regulated verticals where data privacy and auditability matter.

  • Experience contributing to a modular / clean architecture codebase across multiple bounded contexts.

  • Prior experience in fast-growing startups where you owned infrastructure other engineers depended on daily.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Experience

4 years minimum

Location requirements

Hiring timezones

United States +/- 0 hours

About NirYu

Learn more about NirYu and their company culture.

View company profile

Nir-Yu is dedicated to elevating your business through effective nearshore staffing solutions. With a focus on providing high-quality talent from Latin America, we empower small and medium enterprises (SMEs) to tap into global skills at affordable rates. Our strategic approach enables companies to overcome the challenges of finding qualified professionals within budget constraints. As larger organizations leverage nearshoring to secure superior talent while optimizing costs, Nir-Yu democratizes access to these resources, allowing SMEs to thrive in a competitive landscape.

Our services cover a range of staffing needs, including international PEO, staff augmentation, talent acquisition, and tailored staffing solutions. No matter your requirements, our team is well-equipped to find candidates that fit seamlessly into your business model, ensuring compliance and maximizing operational efficiency. Our clients have experienced transformative growth and have praised our personalized approach in managing their staffing needs. With Nir-Yu, hiring becomes straightforward, allowing you to concentrate on what truly matters—growing your business and achieving your objectives.

Employee benefits

Learn about the employee benefits and perks provided at NirYu.

View benefits

Flexible Schedule

Provides a flexible work schedule.

Remote Full Time Work

Offers full-time remote work opportunities.

Personal Development

Supports personal development of employees.

Autonomy of time and schedule

Employees have autonomy over their time and schedule.

View NirYu's employee benefits
Claim this profileNirYu logoNI

NirYu

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

18 remote jobs at NirYu

Explore the variety of open remote roles at NirYu, offering flexible work options across multiple disciplines and skill levels.

View all jobs at NirYu

Remote companies like NirYu

Find your next opportunity by exploring profiles of companies that are similar to NirYu. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan