As a Site Reliability Engineer, you will play a critical role in ensuring the availability and performance of our customer-facing platform. You will work closely with DevOps, DBA, and Development teams to provision and maintain infrastructure, deploy and monitor our applications, and automate workflows.
Requirements
- Manage, monitor, and maintain highly available systems (Windows and Linux)
- Analyze metrics and trends to ensure rapid scalability
- Address routine service requests while identifying ways to automate and simplify
- Create infrastructure as code using Terraform, ARM Templates, Cloud Formation
- Maintain data backups and disaster recovery plans
- Design and deploy CI/CD pipelines using GitHub Actions, Octopus, Ansible, Jenkins, Azure DevOps
- Adhere to security best practices through all stages of the software development lifecycle
- Follow and champion ITIL best practices and standards
- Become a resource for emerging and existing cloud technologies with a focus on AWS
- 5+ years of experience in SRE or System Administration role
- Demonstrated ability building and supporting high availability Windows/Linux servers, with emphasis on the WISA stack (Windows/IIS/SQL Server/ASP.net)
- 3+ years of experience with CI/CD tools
- 3+ years of experience working with cloud technologies including AWS, Azure
- 1+ years of experience working with container technology including Docker and Kubernetes
- Comfortable using Scrum, Kanban, or Lean methodologies
- Bachelor’s Degree or College Diploma in Computer Science, Information Systems, or equivalent experience
Benefits
- Bonus
