Flock is seeking an experienced Site Reliability Engineer III to join its Aviation organization. The successful candidate will be responsible for designing and building systems, tooling, and processes to provide an extensible, scalable, and observable platform. They will empower development teams to own and manage their full application stack, thus minimizing bottlenecks and optimizing development velocity without compromising on reliability.
Requirements
- Experience in an SRE role with an understanding of monitoring, troubleshooting, and disaster recovery
- Extensive experience in writing production-quality code
- Proficiency with infrastructure as code and/or configuration management (we use Terraform)
- Experience with managing monitoring dashboards using tools like Grafana and Prometheus to create actionable alerts
- Ability to ensure the system is running and in line with internal SLIs and SLOs
- Experience refining CI/CD processes to ensure new code is pushed to production in a reliable and efficient manner
- Ability to collaborate on creating a robust monitoring platform for our services and their underlying infrastructure, aiming to alert on symptoms and not outages
- Familiarity with best practices when creating and managing AWS resources (e.g. security groups, VPCs)
Benefits
- Flexible PTO
- Fully-paid health benefits plan for employees
- Family Leave
- Fertility & Family Benefits
- Spring Health
- Caregiver Support
- Carta Tax Advisor
- WFH Stipend
- Productivity Stipend
- Home Office Stipend
