Dev.Pro is seeking a skilled Kubernetes Developer to join a fully remote international team focused on optimizing GPU infrastructure and developing custom operators for AI/HPC workloads. The role involves designing, building, and managing Kubernetes platforms, enhancing performance and reliability, and ensuring security and compliance for multi-tenant HPC environments.
Requirements
- 3+ years of hands-on Kubernetes experience in production
- Experience with HPC schedulers (Slurm, PBS, LSF, Volcano)
- Strong background in GPU resource management and distributed systems
- Experience with cloud/hybrid cloud architectures (AWS, GCP, Azure, on-prem GPU clusters)
- Knowledge of Kubernetes operators, CRDs, scheduling, networking, and storage
- Deep knowledge of HPC job scheduling and workload orchestration
- Expertise in IaC (Terraform, Helm, or GitOps: ArgoCD/Flux)
- Deep knowledge of HPC job scheduling and workload orchestration
- Experience in storage management and optimization for large datasets
- Programming skills in Go, Python, Bash/Shell
- Familiarity with PyTorch, TensorFlow, distributed training, and model serving
- Skills in Linux administration, performance tuning, and advanced networking
- Experience in storage management and optimization for large datasets