Description
Work model: USA - 100% remote
Akeyless Security delivers a cloud-native SaaS platform that integrates Vaultless Secrets Management with Certificate Lifecycle Management, Next Gen Privileged Access Management (Secure Remote Access), and Encryption Key Management to manage the lifecycle of all machine identities and secrets across all environments.
Trusted by Fortune 100 companies and industry leaders, Akeyless is redefining identity security for the modern enterprise, delivering the world’s first unified Secrets & Machine Identity platform designed to prevent the #1 cause of breaches - compromised identities and secrets. Backed by the world’s leading cybersecurity investors and global financial institutions including JVP, Team8, NGP Capital, and Deutsche Bank.
We are seeking an experienced and talented Site Reliability Engineer (SRE) to play a crucial role in the development of our highly robust, multi-cloud, and multi-region SaaS platform.
As an SRE at Akeyless, you will join a high-performing team, responsible for leading and maintaining a complex multi-cloud platform and promptly addressing any issues that arise. This role operates within a dynamic and agile environment, utilizing cutting-edge technologies.
Responsibilities:
- Running a complex hybrid cloud solution and troubleshooting problems as they arise using automation whenever possible.
- Building monitoring tools and alerting capabilities.
- Building a multi-cloud infrastructure and platform components.
- Site security, collaborating with our Security Engineering team.
- Ensuring site performance and capabilities by participating in performance, load, and stress testing.
- Promote SRE principles and operational readiness within Akeyless engineering, emphasizing cloud engineering best practices.
- Assessing and determining root cause analysis of problems, turning them into opportunities to positively impact performance, reliability, functionality and security.
- Advocating for the end customer and delivering a customer experience that exceeds expectations.
Requirements
- 4+ years of hands-on SRE/DevOps experience.
- Monitoring scalable production systems for rapidly growing global infrastructure.
- Architect and implement automation for cloud infrastructure.
- Integrating new tools into our systems, such as monitoring, configuration, alerting etc.
- Experience in Cloud environments (AWS, GCP, Azure).
- Resolve NOC escalations and help prevent reiteration of incidents.
- Leading the NOC processes, procedures and automations.
- Diagnose and troubleshoot complicated technical cases.
- Develop, augment and maintain Ops documentation.
- Excellent scripting skills and experience (Shell, Python, Go).
- Plans & executes independently as well as acting as a strong team player.
- Experience with Kubernetes and Docker in production - MUST.
- Experience with Linux - MUST.
Advantages:
- Responsibility for high-performance SaaS platform operation - huge advantage.
- Ability to root cause analysis skills and big-picture thinking.
- Ability to document technical information.
- Networking knowledge, Protocols (e.g, TCP/IP, HTTP), Network Operations.
- Develop, augment and maintain Ops documentation.
Base salary: $130K-$160K
In addition: Company Stock Options + Benefits
The compensation package depends on experience