Location: Dubai / Worldwide Remote
Role Overview
We are seeking a Platform Operations Engineer to build, operate, and secure the infrastructure powering our AI systems. This role combines DevOps, SRE, and platform engineering to support high-performance AI workloads.
Key Responsibilities
- Design and maintain scalable cloud infrastructure on AWS and GCP
- Manage containerized environments using Docker and orchestration tools
- Operate and optimize AI inference systems (e.g. vLLM)
- Implement SRE best practices for reliability, monitoring, and incident response
- Ensure strong security and compliance (SecOps) across systems
- Support internal engineering teams with platform tooling and automation
Requirements
- 5+ years of experience in DevOps, SRE, or Platform Engineering
- Strong experience with AWS, GCP, Docker, and cloud-native systems
- Background aligned with Google SRE-style operational excellence
- Experience supporting high-throughput or ML/AI infrastructure
- Strong understanding of security, observability, and system performance
