Zefr is seeking a Principal Site Reliability Engineer to lead the technical vision and direction of reliability practices across the organization. The ideal candidate will have 10+ years of experience in designing, managing, and deploying cloud infrastructure in a production environment, with a focus on observability, cloud infrastructure, CI/CD, and DevSecOps. The role requires strong technical leadership skills, including mentoring engineers, driving cross-functional projects, and influencing architectural decisions at an organizational level.
Requirements
- 10+ years of experience designing, managing, deploying, and supporting Cloud Infrastructure in a production environment using major public cloud providers
- Experience in Advertising or AdTech
- Demonstrated technical leadership experience
- Knowledge of GitOps and CI/CD pipelines
- Advanced Proficiency with IaC and configuration management tools
- Deep production experience architecting, managing, deploying, and supporting container based workloads into Kubernetes clusters
- Proven track record of building and scaling reliability practices
- Heavy Production experience with observability platforms and practices
- Strong knowledge of cloud networking, cloud security, and cost optimization strategies
- Exceptional written and verbal communication skills
Benefits
- Flexible PTO
- Medical, dental, and vision insurance with FSA options
- Company-paid life insurance
- Paid parental leave
- 401(k) with company match
- Professional development opportunities
- 13 paid holidays off
- Summer Fridays
- In-office, hybrid, and fully-remote work options available
- In-office lunches and lots of free food
- Optional in-person and virtual events
