We are a leading trading platform that is ambitiously expanding to the four corners of the globe. Our top-rated products have won prestigious industry awards for their cutting-edge technology and seamless client experience. We deliver only the best, so we are always in search of the best people to join our ever-growing talented team.
We're looking for a Senior DevOps/SRE Engineer to join our DevOps team and take end-to-end ownership of our cloud and on-premise environments. You will be a key contributor to building scalable, reliable, and secure systems that power our trading platform at a global scale.
This is a hands-on role: you'll architect and operate cloud infrastructure, drive automation and observability excellence, build robust CI/CD pipelines, and help shape the engineering culture around reliability and operational best practices.
Responsibilities:
- Design, deploy, and maintain scalable cloud infrastructure on AWS, ensuring high availability, performance, and security across all environments.
- Own and evolve Kubernetes cluster management — including bare-metal deployments — and ensure reliable containerised workloads using Docker and Helm.
- Build and maintain CI/CD pipelines using GitLab CI, incorporating GitOps principles with FluxCD or ArgoCD to streamline and automate delivery workflows.
- Define and manage Infrastructure as Code using Terraform, ensuring all infrastructure changes are version-controlled, repeatable, and reviewed.
- Lead monitoring and observability initiatives: implement and maintain dashboards, alerting, and log pipelines using VictoriaMetrics/Prometheus, Grafana, and the ELK stack.
- Operate and optimize Apache Kafka ecosystems, including Strimzi, Kafka Connect, and MirrorMaker, to support real-time data pipelines.
- Drive incident response, root cause analysis, and post-mortem culture to continuously improve system reliability.
- Collaborate closely with Engineering, Security, and Product teams to embed DevOps best practices across the organisation.
- Mentor and guide junior engineers, raising the overall engineering bar for infrastructure reliability and automation.
Requirements:
- 6+ years of hands-on experience in a DevOps or SRE role.
- Strong knowledge of AWS services, including: VPC, EC2, EKS, S3, ECR, EBS, RDS, ElastiCache, IAM, KMS, Secrets Manager, SSM Parameter Store, CloudWatch, MSK, SNS, SQS, Route 53, Direct Connect, Transit Gateway, and ELB/ALB/NLB.
- Solid Linux administration skills with deep understanding of system internals.
- Deep expertise in Kubernetes, including bare-metal cluster deployment and day-2 operations. Proficiency with Docker and Helm.
- Hands-on experience with Terraform as a primary Infrastructure as Code tool — writing, reviewing, and maintaining production-grade modules.
- Proven experience with GitLab CI for building and maintaining CI/CD pipelines; familiarity with GitOps practices using FluxCD or ArgoCD.
- Strong background in monitoring and observability: VictoriaMetrics or Prometheus, Grafana, and the ELK stack; solid understanding of log collection and processing with Fluentbit, Fluentd, and Logstash.
- Experience operating and managing Apache Kafka ecosystems, including Strimzi, Kafka Connect, and MirrorMaker.
- Experience with Ansible for configuration management; experience with AWX is a plus.
- Proficiency in scripting and automation with Bash, Python, and Go.
- Strong communication skills and the ability to collaborate cross-functionally in a fast-paced, regulated environment.
- English language proficiency.
