Description
Ledgebrook is a tech-enabled E&S MGA on a mission to modernize Specialty insurance. The industry is burdened with legacy technology and inefficient processes, preventing innovation at scale. We are changing that. Our goal is to become the best-in-class full-stack insurance and re/insurer, leveraging AI and data-driven insights to revolutionize underwriting, pricing, and risk selection.
We believe in talent density - fewer, better people working together as one. We win as a team, and our success is shared through generous equity packages for all employees.
About the Role
We’re seeking a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of Ledgebrook’s cloud-native infrastructure and applications. You’ll be at the forefront of operational excellence, driving automation, observability, and stability across our platforms.
This role combines infrastructure engineering, systems automation, and software development practices to proactively build resilient systems and reduce downtime, directly influencing our ability to scale rapidly and reliably.
What You’ll Work On
- Reliability & Scalability – Architect and maintain highly available, scalable cloud infrastructure to ensure the seamless operation of Ledgebrook's production and internal environments.
- Observability & Monitoring – Implement and optimize monitoring, logging, tracing, and alerting systems to proactively detect and resolve issues before they impact business operations.
- Incident Response & Management – Lead and participate in incident response efforts, performing root cause analysis and implementing corrective actions and improvements to prevent recurrence.
- Infrastructure as Code (IaC) – Automate infrastructure deployments and management using IaC tools (Terraform), ensuring consistency, repeatability, and security.
- Continuous Integration & Delivery (CI/CD) – Develop and maintain robust CI/CD pipelines, accelerating software delivery while maintaining system stability and security.
- Performance Optimization – Identify performance bottlenecks and work cross-functionally to optimize application and infrastructure performance.
- Security & Compliance – Ensure infrastructure adheres to best practices for security, compliance, and disaster recovery planning, maintaining up-to-date documentation and procedures.
Why Join Ledgebrook?
- High-Impact Role – You’ll directly influence Ledgebrook's technology operations, helping us rapidly scale and drive industry innovation.
- Generous Equity – We play as a team, and our success is shared.
- Cutting-Edge Environment – Work closely with AI-driven solutions and modern infrastructure technologies.
- Fully Remote & Flexible – Work from anywhere in North America or Poland.
At Ledgebrook, we Care, Strive, and Thrive Together—and we’re building the future of insurance!
About you
Here at Ledgebrook we’re passionate about creating a team that thrives on continuous learning and shares our excitement about building a company from the ground up. We’re looking for people to join us with:
- A passion for delivering world-class customer service to our internal and external customers
- Intellectual curiosity and a strong desire for innovation, rather than following the status quo
- A hunger for continuous learning and opportunities to grow
- Agile prioritization skills coupled with a keen sense of urgency - we balance getting it right with getting it done right now
- A strong drive and desire to win together as a high-performing team
- A moral compass to "do the right thing, period." We have zero tolerance for toxic behaviors
- An eagerness to actively participate and connect with the whole team, across remote locations
- An honest, transparent communication style
- A proactive, solution-oriented mindset. We don’t look for blame, we look for the solution
Requirements
Tech Stack
- Infrastructure & Cloud: AWS (ECS, EKS, Lambda, S3, RDS, CloudWatch, IAM)
- IaC & Automation: Terraform, CloudFormation
- CI/CD & Containerization: GitHub Actions, Docker, Elastic Container Services
- Observability & Monitoring: Datadog, Sentry
- Languages & Scripting: Python, Bash, JavaScript/TypeScript
Must Haves
- 5+ years experience in Site Reliability Engineering, DevOps, or infrastructure-focused engineering roles
- Extensive hands-on experience with AWS infrastructure and services
- Strong knowledge of container orchestration (Docker, Kubernetes, ECS)
- Experience building and maintaining CI/CD pipelines
- Proficiency in infrastructure automation (Terraform or CloudFormation)
- Solid scripting/coding experience (Python, Bash, JavaScript/TypeScript)
Nice to Haves
- Experience in insurance, fintech, or regulated industries
- Familiarity with Socotra or other insurance-specific platforms
- Background in managing production environments
- Experience working in high-growth startups or fast-paced technology environments
Benefits
- Competitive salary and meaningful equity ownership
- Ownership, autonomy, purpose
- Remote work, flexible hours
- Unlimited time off policy