We are looking for a highly skilled Sr DevOps engineer that has experience in building and maintaining complex cloud environments.
Candidates should have strong experience dealing with Terraform, Kubernetes, CI/CD, Python, Java, Go, AWS, Azure, GCP, Docker, Linux OS.
Requirements
Responsibilities:
- Employing DevOps principles, provide technical operational expertise for comprehensive cloud infrastructure operations for all customers, internal and external.
- Troubleshoot and resolve complex systems problems across multiple layers of the systems stack from ci/cd, container-based systems, networking, operating systems, cloud resources, and databases.
- Instrument Monitoring and Alerting infrastructure for critical services.
- Creating, revising, and testing operational runbooks and automation for maintaining Infrastructure.
- Design and code appropriate tools to support our internal platforms and systems.
● Participate in our on-call schedule.
- Proactively pursue opportunities of operational innovation to improved stability, reliability, and availability of services.
Requirements:
- Embody a Quality-first & Security-first culture in all that you do.
- 5+ years of experience with Terraform and Kubernetes / Docker in at least one of the top tier cloud providers (Azure, GCP, AWS, etc.).
- 5+ years of experience coding with languages Python, Java, Go, etc.
- 5+ years of experience using IaC and CI/CD tools like FluxCD (or similar), Jenkins, Terraform, Github, etc.
● Strong experience with the Linux OS
- Strong working knowledge of Networking (TCP/IP and Application)
- A willingness to author technical documentation for design, workflows, processes, best practices, etc.
- Willing to mentor other team members and engineers.
- Strong bias for action vs endless planning, you’re hands on, have made a
mistakes,learned from them and can balance risk vs. impact to customers.
- You value clear communication and you're empathetic and respectful of others.
- Operational experience with monitoring/alerting systems such as Sentry, Opsgenie, Prometheus.
- Deep understanding of cloud performance, and how to diagnose and resolve bottlenecks, and keep the performance at optimal levels.
