We’re growing fast! Spun out of the Stanford AI lab and chaired by Google-X founder Sebastian Thrun, Cresta launched in 2020. Since then, we’ve grown revenue and our team by 300%! We’ve assembled a world-class team of AI and ML experts, go-to-market leaders, and top-tier investors and advisors including Andreessen Horowitz, Greylock Partners, and former AT&T CEO John Donovan. Our valued customers include brands like Intuit, Porsche, Verizon and Mutual of Omaha and we have been recognized as a startup to watch by Business Insider, Forbes, and Gartner to name a few. We have huge ambitions and are looking for stellar candidates who have an entrepreneurial mindset and are excited to use cutting-edge AI to solve real-world business problems.
As a member of the infrastructure team you are responsible for designing, building, and advancing our core infrastructure that allows the engineering team to execute quickly, productively, and securely. You will join a collaborative but highly autonomous working environment in which each member has a defined role with clear expectations, as well as the freedom to pursue projects they find interesting.
Please note we are hiring in Toronto and Berlin.
What you'll do
- Developer Toolchain. Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
- Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
- Metrics, logging, analytics, and alerting for performance and security across all endpoints and applications.
- Infrastructure-as-code deployment tooling and supporting services on multiple cloud providers.
- Automate operations and engineering. focus on automation so we can spend energy where it matters.
- Building machine learning infrastructure that enables AI teams to train, test, and deploy on large-scale datasets.
What we're looking for
- 5+ years experience in DevOps, Site Reliability Engineering, Production Engineering, or equivalent field.
- Deep proficiency with coding languages such as Golang or Python.
- Deep familiarity with container-related security best practices.
- Production experience working with Kubernetes, and a deep understanding of the Kubernetes ecosystem, including popular open-source tooling such as cert-manager or external-dns. Experience with GPU-enabled clusters is a bonus.
- Production experience with Kubernetes templating tools such as Helm or Kustomize.
- Production experience with IAC tools such as Terraform or CloudFormation.
- Production experience working with AWS and services such as IAM, S3, EC2, and EKS. Production experience with database software such as PostgreSQL
- Experience with GitOps tooling such as Flux or Argo.
- Experience with CI/CD and feature gating systems.Fluency in Linux operations and configurations.