Caseware is seeking an experienced Site Reliability Engineer to contribute to building and scaling our AI platform by ensuring our systems on AWS, Kubernetes, and GitOps workflows are reliable, observable, and automated.
Requirements
- Maintain reliable, high-performing AWS production systems
- Manage EKS clusters for configuration, scaling, and workload stability
- Set up and support Istio service mesh for traffic control and security
- Oversee GitOps workflows to ensure secure, consistent infrastructure changes
- Create automation tools and platform enhancements
- Design, implement, and manage monitoring, logging, and tracing solutions across a diverse range of applications—including AI workloads, microservices, and data pipelines—to ensure visibility, reliability, and rapid issue resolution
- Respond to incidents, analyze root causes, and recommend lasting solutions
- Work with developers and platform teams to enhance deployments and system operations
- Support nx-based monorepos for scalable, effective developer workflows
Benefits
- Contrato a término indefinido with all the legal benefits
- Prepaid Medicine
- Life insurance and funeral assistance
- Internet allowance
- Home office stipend
- Competitive compensation — above the market average
- 100% remote work environment and an excellent work-life balance
- Opportunity to work for a growing global SaaS leader company
- A culture that promotes independence, innovation, trust, and accountability
- Open space to be creative, innovative, and strategize for the future
- Mentorship by a highly experienced professional
- Budget for training, we want you to grow
- 5 Personal Time Off days per year
- Sick Leave Top up to total 100% of salary paid by the employer from Day 3 to 90.
- Recognition Award, additional paid time off in recognition of the corresponding year of service
- Upgrade vacation starting at 5 years of service
