We are looking for: Principal AI Platform Engineer
Key Responsibilities
- Define and evolve the target architecture and roadmap for enterprise‑scale Data and AI platforms, covering experimentation, training, feature management, model registry, CI/CD, serving, and observability.
- Design and build multi‑tenant, multi‑region, highly available AI platforms with clear governance and guardrails.
- Partner with product management to define platform vision, backlogs, OKRs, and golden paths that enable self‑service from ideation to production.
- Lead capacity planning and cost optimization strategies for GPU and CPU workloads, driving performance and scalability for distributed training and inference.
- Integrate AI platforms with enterprise data ecosystems to enable governed, reproducible, and scalable ML pipelines.
- Act as a technical leader, translating complex platform concepts into clear value propositions for senior stakeholders across R&D, Commercial, and Operations.
Requirements
Mandatory
- Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related quantitative field.
- Proven experience as a platform or infrastructure engineer supporting ML/AI at scale.
- Hands‑on experience with Domino Data Lab.
- Strong experience with AWS (or equivalent cloud providers), including compute, storage, networking, IAM, and cost management.
- Production experience administering EKS clusters, including GPU workloads, operators, storage classes, and service mesh.
- Strong Python development experience, especially for platform automation and tooling.
- Solid background in Infrastructure as Code (Terraform, CloudFormation or similar).
- Experience with MLOps practices: model pipelines, lifecycle management, CI/CD, and monitoring.
- Experience with LLM serving, RAG architectures, vector databases, prompt safety, and token‑aware scaling.
- Experience designing and operating agentic systems, including multi‑agent orchestration, tool/action frameworks, safety guardrails, and evaluation of reliability and cost.
Nice to Have
- Experience with Apache Spark and large‑scale data processing platforms.
- Familiarity with GxP or regulated environments.
- Experience working closely with cybersecurity and data privacy teams.
- Exposure to AIOps practices and advanced observability tooling.
- Cloud or Kubernetes‑related certifications.
What We Offer
- Permanent contract with a competitive salary.
- Flexible working model with remote work options.
- Personalized career path and continuous learning (certifications, English training, etc.).
- Participation in stable, long‑term projects with high technical complexity.
- Flexible working hours and strong work‑life balance focus.
- Social benefits package tailored to your needs.
Follow us on Facebook, LinkedIn, Twitter, or Instagram: @NEORIS
