Location:
We are looking for a DevSecOps/Site Reliability Engineer (SRE)to join our team. As a DevSecOps/SRE, you will be responsible for the automation,deployment, maintenance, and monitoring of our web, mobile, and APIapplications. You will be responsible for ensuring the reliability,availability, and performance of our production systems while continuouslyimproving the build/test/deploy process, identifying bottlenecks, and making itfaster, upgrading components, solving issues, and making it morecost-effective.
Responsibilities:
- Develop and maintain automation tools for building, testing,and deploying software applications and services.
- Deploy all web, mobile, and API applications in production,plan their releases, ensure consistency, and follow up on testing.
- Work closely with developers, QA, and product teams toensure timely and high-quality releases.
- Develop and maintain monitoring and alerting systems toensure high availability and performance of applications and services.
- Monitor metrics and logs from all infrastructure and appcomponents, writing integrations if necessary, and creating dashboards toobserve the production systems.
- Create alert triggers and monitor performance for allcomponents to identify bottlenecks and modify auto-scaling rules if necessary.
- Upgrade infrastructure resources and respond to cloud vendorrecommendations of rotating secrets, upgrading databases, and machine clusters.
- Continuously evaluate the cost of cloud services and ensurewe are not paying expenses unnecessarily.
- Troubleshoot and resolve issues related to infrastructure,deployment, and application performance.
- Work with third-party vendors to integrate with theirservices for observability, security, monitoring, and error reporting.
- Oversight and implementation, operation and monitoring of information security tools and processes in customer production environments
- Conduct IT risk assessments, documenting identified threats and maintaining risk register
- Communicate information security risks to executive leadership
- Report information security risks annually to company leadership and gain approvals to bring risks to acceptable levels
- Develop disaster recovery plans and participate in theirexecution during disaster recovery events.
Requirements
- Bachelor's degree in Computer Science or related field.
- At least 5 years of experience in a DevSecOps/SRE or relatedrole.
- Strong experience in deploying web, mobile, and APIapplications in production.
- Strong experience in monitoring and observability tools,such as NewRelic, Datadog, or Prometheus/Grafana.
- Strong experience with CI/CD pipelines and associated toolssuch as Azure Pipelines, Jenkins, or CircleCI.
- Strong experience with containerization technologies such asDocker, Kubernetes and Helm
- Experience with cloud infrastructure such as AWS, Azure, orGCP.
- Experience with scripting languages such as Bash.
- Experience with incident response and disaster recoveryplanning.
- Excellent communication and collaboration skills.
Details
