Beyond is a technology consultancy helping organizations thrive in a rapidly changing world.
We build, modernize, scale, and operationalize technology, creating Cloud and AI solutions to unlock productivity and drive customer growth.
Role Overview
- We're looking for a highly experienced Senior MLOps Engineer to own the automation, scaling, and operational excellence of our machine learning systems. This role is the critical bridge between our data science/ML engineering teams and a high-availability production environment.
- You will take existing pipelines and evolve them to be best-in-class, responsible for operationalising new models (like NBA, ranking, and LLM-based solutions) with agility and efficiency. Your primary goal is to create a seamless, reliable, and highly observable environment on GCP that empowers our Data Scientists and ML Engineers to iterate and deploy models faster. You will be expected to have created or significantly evolved MLOps frameworks in the past and be able to quantify the improvements you deliver (e.g., in deployment frequency, model performance monitoring, or system reliability).
What You'll Do:
- Take ownership of and evolve our end-to-end ML lifecycle, from data ingestion and feature engineering pipelines to model training, deployment, and real-time serving.
- Design, build, and manage robust, automated CI/CD/CT (Continuous Integration / Continuous Delivery / Continuous Training) pipelines specifically for ML models, integrating with existing CI/CD patterns.
- Leverage the GCP ecosystem, especially Vertex AI Pipelines, Vertex AI Endpoints, and Vertex AI Model Registry, to create a standardised and efficient path to production.
- Design and own a best-in-class observability framework for ML models in production. This includes implementing granular monitoring for model performance (accuracy, bias), data and concept drift, and operational health (latency, throughput, error rates).
- Collaborate closely with Data Scientists and ML Engineers to understand their needs, building the tools and abstractions that create a seamless environment and accelerate their workflow.
- Optimise ML serving infrastructure for low-latency, real-time personalisation requirements.
- Partner with data engineering to ensure robust integration with feature stores and data sources (like BigQuery and Oracle).
- Define and track key MLOps metrics to quantify and communicate improvements in system performance, model quality, and team velocity.
What We're Looking For
- 7+ years of deep, hands-on experience in a dedicated MLOps or DevOps role with a strong focus on machine learning systems.
- Proven experience building or evolving MLOps frameworks from the ground up, with clear examples of the improvements you delivered.
- Expert-level knowledge of the GCP cloud stack, particularly Vertex AI (Pipelines, Endpoints, Training), BigQuery, Pub/Sub, and GKE.
- Deep expertise in building and managing observability stacks for real-time ML systems (e.g., using tools like Prometheus, Grafana, ELK stack, or specialised platforms).
- Proven experience operationalising LLM-based systems, including managing embedding generation pipelines, vector databases, and fine-tuning/deployment workflows.
- Strong practical experience with Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible).
- Demonstrable expertise in building and managing complex CI/CD pipelines.
- Proficiency in Python and experience with scripting for automation, infrastructure management, and building tooling for ML teams.
- Strong understanding of containerisation (Docker, Kubernetes) and microservices architecture as it applies to ML model serving.
Nice to Have
- Relevant Google Cloud certifications (e.g., Professional Machine Learning Engineer, Professional Cloud DevOps Engineer).
- A BSc, MSc, or PhD in Computer Science, Engineering, or a related technical field.
- Hands-on experience with Datadog, especially for monitoring ML systems and cloud infrastructure.
- Familiarity with the specific deployment challenges of ranking, recommendation, or NBA models.
- Experience with other ML platforms or tools (e.g., Kubeflow, MLflow).
- Knowledge of networking and security principles within GCP.
Having been named among the Sunday Times Best 100 Companies, we believe culture plays a large role in what we offer as an organization. We actively promote diversity in all its forms across our Studios, and we proudly, passionately, and proactively strive to create a culture of inclusivity and openness for all our employees.
Beyond is committed to welcoming everyone, regardless of gender identity, orientation, or expression. Our mission is to remove exclusivity and barriers and encourage new thinking and perceptions in a space of belonging. It is not about race, gender, or age, it is about people. And without our people being their most creative and innovative selves, we are nothing.
