THE ROLE
We're looking for a Senior Platform Engineer / SRE who can lead complex infrastructure work, drive IaC
and GitOps architecture, and set the standard for how we automate and operate systems at scale. You'll
tackle hard problems — multi-tenant isolation, self-service infrastructure, reliability engineering — and
have the scope to solve them properly.
This is not a ticket-processing role. Seniors here identify problems before they're asked, make
architectural calls, mentor engineers, and raise the ceiling on what the platform can do.
WHAT YOU'LL WORK ON
- Lead IaC architecture — Terraform module design, state management, multi-account patterns, and
setting the standards the rest of the team builds against
- Drive GitOps at scale — ArgoCD configuration, progressive delivery patterns, promotion
workflows, and deployment reliability across multiple environments and tenants
- Architect and operate multi-tenant Kubernetes infrastructure on AWS EKS — tenant isolation,
workload placement, cluster topology, and long-term scalability strategy
- Build self-service infrastructure automation — provisioning pipelines, configuration management,
and platform capabilities that engineering teams can consume without manual intervention
- Lead the use of agentic coding tools for infrastructure work — scaffolding new environments,
generating and reviewing IaC, accelerating automation, and establishing patterns for the team
- Own reliability — SLO definitions, error budgets, incident response quality, and the feedback loop
that turns incidents into platform improvements
- Set observability standards — trace coverage, alert quality, on-call ergonomics, and runbook
culture
- Partner with security on zero-trust architecture, secrets management at scale, and infrastructure
hardening
- Contribute to technical roadmap and help the team prioritize the right work
- Mentor mid-level engineers — code review, design feedback, on-call shadowing
WHAT WE'RE LOOKING FOR
- 6+ years in platform engineering, SRE, or infrastructure — with meaningful time operating
production systems at scale
- Deep IaC expertise — you design Terraform architectures, not just write modules; you've managed
complex state and multi-account configurations in production
- Strong GitOps background — you understand declarative infrastructure management at depth and
have opinions on how to do it well
- Deep Kubernetes knowledge — you've operated clusters in production, dealt with real failure
modes, and understand the system at the control plane level
- Strong AWS background — networking, compute, IAM, storage, multi-account design
- Experience with multi-tenant infrastructure — isolation patterns, noisy neighbor mitigation, and
tenant lifecycle management
- Automation-first thinking at a senior level — you design systems that eliminate entire categories of
manual work, not just individual tasks
- Active user of agentic coding tools — you know how to direct them effectively, review their output
critically, and use them to multiply your output
- Reliability engineering track record — SLOs defined and measured, post-mortems run,
measurable improvements driven
- Strong communicator — you can make architectural decisions legible to engineers and leadership
alike
NICE TO HAVE
- Experience with Karpenter and node lifecycle management in production
- Background in FinOps — cost attribution, reserved capacity planning, workload right-sizing
- Familiarity with data infrastructure — object storage, CDC pipelines, or lakehouse patterns
- Experience supporting AI/ML inference workloads or GPU-based compute in production
- Prior experience scaling platform infrastructure at a startup moving toward enterprise-grade
requirements
WHAT YOU WON'T FIND HERE
A platform team that maintains the status quo. We're actively building — new scale requirements, new
architectural domains, new automation capabilities. Senior engineers here shape how the platform
evolves, and the tools available to do it are better than they've ever been
