What You'll Do
- Collaborate with a multidisciplinary team to optimize machine learning models for production use cases, ensuring they are highly efficient and scalable
- Design and build efficient serving infrastructure for machine learning models that supports large-scale deployments across different regions
- Optimize machine learning models in Pytorch or other libraries for real-time serving and production applications
- Lead the effort to transition machine learning models from research and development into production, working closely with researchers and machine learning engineers
- Build and maintain scalable Kubernetes clusters to manage and deploy machine learning models, ensuring reliability and performance
- Implement and monitor logging metrics, diagnose infrastructure issues, and contribute to an on-call schedule to maintain production stability
- Influence the technical design, architecture, and infrastructure decisions to support new and diverse machine learning architectures
- Collaborate with stakeholders to drive forward initiatives related to the serving and optimization of machine learning models at scale.
Who You Are
- You have a passion for speech, audio and/or generative machine learning
- You have world-class expertise in optimizing machine learning models for production use cases, and extensive experience with machine learning frameworks like Pytorch
- You are experienced in building efficient, scalable infrastructure to serve machine learning models, and managing Kubernetes clusters in multi-region setups
- You have a strong understanding of how to bring machine learning models from research to production and are comfortable working with innovative, cutting-edge architectures
- You are familiar with writing logging metrics and diagnosing production issues, and are willing to take part in an on-call schedule to maintain uptime and performance
- You have a collaborative mindset, enjoy working closely with research scientists, machine learning engineers, and backend engineers to innovate and improve model deployment pipelines
- You thrive in environments that require solving complex infrastructure challenges, including scaling and performance optimization
- Experience with low-level machine learning libraries (e.g., Triton, CUDA) and performance optimization for custom components is a bonus
Where You'll Be
- We offer you the flexibility to work where you work best! For this role, you can be within the European region as long as we have a work location.
- This team operates within the GMT/CET time zone for collaboration.
- Excluding France due to on-call restrictions.