We are looking for a Site Reliability Engineer to join our Platform Engineering team managing critical infrastructure across AWS and GCP. The role involves developing and implementing automation, managing Kubernetes clusters, contributing to self-service tooling, and ensuring reliability.
Requirements
- Solid hands-on experience with at least one major cloud provider (AWS or GCP), and familiarity with another.
- Demonstrated experience with Infrastructure as Code, particularly Terraform; familiarity with Crossplane is a plus.
- Proven experience managing Kubernetes clusters, including workload configuration, optimization, and troubleshooting.
- Understanding of GitOps practices, CI/CD pipelines, and experience with automation tools like Spacelift.
- Strong automation and scripting capabilities (e.g., Python, Bash, Go).
- Experience with monitoring and observability tools such as Prometheus and Grafana.
- Excellent problem-solving abilities, including expertise in root cause analysis.
- Clear written and verbal communication skills in English.
Benefits
- Competitive Compensation Package
- Workation: Work up to 60 days per year in a country different from your home country, with up to 20 working days per trip
- Learning & Development Budget
- Academy: Regular training sessions, access to Coursera and Babbel training courses
- Flexibility: Morning person or night owl? We believe in outcome and motivated employees