This role is critical to ensuring the reliability, scalability, and efficiency of large-scale, self-managed Kubernetes environments across customer data centers. You’ll be working directly with customer platform teams to operate and improve their Kubernetes estate, enhance observability and automation, and extend platform capabilities via Portainer and complementary tools.
Requirements
- Operate and manage self-hosted Kubernetes clusters at scale (5,000+ nodes per region) across multiple sites.
- Serve as a subject-matter expert on Kubernetes internals, delivering proactive support, performance tuning, and architectural recommendations.
- Enable and extend platform tooling using Portainer, integrating it with identity, observability, and lifecycle management systems.
- Design and automate Day-2 operational workflows including node lifecycle, network overlays, and storage provisioning.
- Lead technical engagements such as architecture reviews, operational readiness assessments, and incident postmortems.
- Build and maintain IaC pipelines and GitOps patterns using tools like Terraform, ArgoCD, and Flux.
- Troubleshoot and resolve advanced infrastructure issues related to scheduling, networking, DNS, ingress, and runtime isolation.
- Contribute to internal reusable tooling, engineering standards, and automation frameworks.
- Collaborate with customer stakeholders and internal technical teams across time zones as part of a 24/7 high-availability model.
Benefits
- Highly competitive salary
- Ability to work anywhere in the world