About the Role:
Responsibilities:
- Lead the team through all scrum ceremonies, promoting active participation and collaborating with project managers to assign resources.
- Drive the team to meet development goals and adhere to roadmap deadlines.
- Undertake hands-on development tasks regularly, with a focus on Kubernetes, multi-cloud infrastructure management, and automating everything as code.
- Collaborate with product management, research and update backlog items, gather requirements, and help drive backlog refinement sessions
- Lead technical discussions and work with the team to design robust, secure, monitored, and well-maintained systems. Mentor team members on best design practices.
- Encourage and facilitate continuous learning within the team, ensuring they stay updated with the latest DevOps technologies and multi-cloud best practices.
- Help define team goals by establishing Objectives and Key Results (OKRs) and Key Performance Indicators (KPIs). Support a 24x7 on-call rotation for production incidents.
- Collaborate with senior engineers to enhance the software scalability, maintainability, and security across multi-cloud environments.
- Engage with product management and other Cyderes teams, ensuring seamless communication and cooperation.
- Foster a positive team environment, guide members toward professional milestones, and partner with HR and engineering leadership on recruitment and performance reviews.
Requirements:
- Minimum of 8 years in or DevOps.
- Proficient in managing multi-cloud environments, specifically AWS, Azure, and GCP.
- Extensive experience with the Kubernetes ecosystem (GKE, EKS, RKE, k3s, and Rancher) and large-scale container orchestration.
- Deep expertise in Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or Azure Resource Manager.
- Extensive experience with Git and CI/CD tools (Jenkins, GitHub, GitHub Actions, Concourse CI, Spinnaker, etc.).
- Practical knowledge of virtualization hypervisors (VMWare, Microsoft Hyper-V, XenServer)
- Strong understanding of best practices throughout the DevOps lifecycle, including code standards, reviews, build processes, testing, and operations.
- Experience with Observability solutions (Grafana, Loki, Thanos, Prometheus, Elasticsearch, etc.)
- Excellent written, verbal, and interpersonal communication skills.
- Proficient in agile methodologies, particularly Scrum, and project management tools like Jira.
- Ability to collaborate and coordinate across multiple teams on cross-functional projects.
- Bachelor’s Degree in relevant field
- Experience leading a DevOps or Site Reliability Engineering (SRE) team.
- Experience with configuration management tools like Ansible, Puppet, or Chef.
- Ability to articulate complex technical concepts to non-technical audiences to meet business goals.
- Experience with Rancher Kubernetes Management Platform.
- Ability to communicate professionally with customers.