Senior HPC Cluster Engineer

Nebius is a cutting-edge AI cloud platform that offers scalable infrastructure for developing and deploying AI solutions.

Nebius

Employee count: 201-500

Czechia only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

Why work at Nebius Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without massive infrastructure costs or the need to build large in-house AI/ML teams. Our employees work at the cutting edge of AI cloud infrastructure alongside some of the most experienced and innovative leaders and engineers in the field.

Where we workHeadquartered in Amsterdam and listed on Nasdaq, Nebius has a global footprint with R&D hubs across Europe, North America, and Israel. The team of over 1400 employees includes more than 400 highly skilled engineers with deep expertise across hardware and software engineering, as well as an in-house AI R&D team.

The role

We’re looking for a Senior HPC Cluster Engineer to join our team and play a key role in the development of our cutting-edge hyperscaler platform. The GPU & InfiniBand team is responsible for enhancing and optimizing the core components of our Cloud platform, with a specific focus on GPU computing, InfiniBand networks, and the KVM/QEMU stack. You’ll work closely with hardware virtualization and device emulation technologies, ensuring high performance and security in multi-GPU, HPC environments. The role involves analyzing, troubleshooting, and improving infrastructure to support new hardware, fine-tuning system performance, and automating fault detection and resolution in a complex system.

In this position, you will be responsible for:

Tuning the performance of GPU clusters and InfiniBand networks to ensure optimal operation in HPC and GPU-based environments. 
Analyzing and troubleshooting the root cause of issues related to GPUs and InfiniBand networks, and proposing corrective actions. 
Integrating new hardware into the existing infrastructure, including support for new GPU hardware through software stacks like Kubernetes, QEMU, and KVM. 
Enhancing automation systems for proactive monitoring, detecting, and resolving issues in GPU and InfiniBand environments. 
Configuring and managing GPU devices and InfiniBand fabrics, ensuring efficient and reliable operation.

We expect you to have:

5+ years of professional experience in system-level software development (focused on performance optimization, low-level programming). 
3+ years of hands-on experience with Linux systems (administration, troubleshooting, and performance tuning). 
In-depth understanding of server architecture, including PCIe devices, NICs, Linux OS/Kernel, and high-performance computing (HPC) systems. 
Strong proficiency in one or more performance-oriented programming languages (C/C++, Go, Python).

It would be a plus if you have:

Experience with GPU end-to-end testing in a cluster environment using InfiniBand networking. 
Proven track record of analyzing and optimizing the performance of HPC workloads (e.g., simulations, data analysis, AI/ML workloads). 
Familiarity with RDMA, RoCE, and InfiniBand protocols for high-performance communication. 
Background in Software-Defined Networking (SDN) and experience with HPC cluster networking. 
Understanding of QEMU/KVM virtualization and managing virtualized environments. 
Experience with deep learning frameworks such as PyTorch and TensorFlow, and their integration with HPC systems. 
Familiarity with collective communication libraries like MPI and NCCL for distributed computing.

We conduct coding interviews as part of the process.

What we offer

Competitive salary and comprehensive benefits package.
Opportunities for professional growth within Nebius.
Flexible working arrangements.
A dynamic and collaborative work environment that values initiative and innovation.

We’re growing and expanding our products every day. If you’re up to the challenge and are excited about AI and ML as much as we are, join us!

Apply now

Please let Nebius know you found this job on Himalayas. This helps us grow!

Apply now

About the job

Apply before

Jun 19, 2026

Posted on

Mar 21, 2026

Job type

Full Time

Experience level

Senior

Experience

5 years minimum

Location requirements

Czechia

Hiring timezones

Czechia +/- 0 hours

Browse similar jobs

Remote Senior HPC-Engineer Jobs Remote Full Time HPC-Engineer Jobs Remote Senior HPC-Engineer Jobs in Czechia Remote Full Time Jobs in Czechia Remote HPC-Engineer Jobs in Czechia

About Nebius

Learn more about Nebius and their company culture.

View company profile

At Nebius, we offer an advanced AI cloud platform designed for those who wish to develop, tune, and deploy their AI models with the most efficient infrastructure available. Our platform utilizes cutting-edge NVIDIA GPU clusters, including the H100 and H200, optimized for maximum performance with InfiniBand. One of the standout features of Nebius is our comprehensive fine-tuning ecosystem that includes on-demand GPUs and tools necessary for robust dataset processing, ensuring that AI teams can efficiently manage their computational resources according to demand.

We recognize the importance of AI inference in deploying real-world applications. Hence, we provide a resilient and cost-effective infrastructure that has been optimized for rapid deployment of Generative AI applications. Our services span the entire lifecycle of AI solutions, from model training to inference, making Nebius not just a GPU cloud but a full-stack AI platform. Additionally, we pride ourselves on supporting our clients with 24/7 expert guidance, offering resources to help architects and engineers harness our AI-optimized data centers to build scalable solutions.

Apply now

Please let Nebius know you found this job on Himalayas. This helps us grow!

Apply now

About the job

Apply before

Jun 19, 2026

Posted on

Mar 21, 2026

Job type

Full Time

Experience level

Senior

Experience

5 years minimum

Location requirements

Czechia

Hiring timezones

Czechia +/- 0 hours

Browse similar jobs

Remote Senior HPC-Engineer Jobs Remote Full Time HPC-Engineer Jobs Remote Senior HPC-Engineer Jobs in Czechia Remote Full Time Jobs in Czechia Remote HPC-Engineer Jobs in Czechia

Claim this profile

Nebius

Company size

201-500 employees

Markets

AI Cloud Computing Artificial Intelligence Machine Learning Infrastructure GPU Cloud Services Generative AI High Performance Computing Cloud Infrastructure Deep Learning Platforms AI Model Training Data Center Services

Employees live in

Netherlands

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

Czechia only

Senior DevOps Engineer

Mirantis

Employee count: 501-1000

Full Time

DevOps

AF, DZ + 56 more

Software Engineer - OpenStack

Canonical

Employee count: 501-1000

Full Time

Cloud Engineering

DZ, AR + 40 more

Software Engineer, Ceph & Distributed Storage

Canonical

Employee count: 501-1000

Full Time

Cloud Engineering

AT, BE + 25 more

Software Engineer - Data Infrastructure

Canonical

Employee count: 501-1000

Full Time

Software Engineering

AM, CY + 5 more

Cloud Infrastructure Engineer (Kineto)

JetBrains

Full Time

Cloud Infrastructure Engineering

AT, BE + 31 more

Senior Software Engineer - Python/Golang - Kubernetes

Canonical

Employee count: 501-1000

Full Time

Software Engineering

122 remote jobs at Nebius

Explore the variety of open remote roles at Nebius, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Nebius

United States only

Frontend Engineer - User Interface

Nebius

Employee count: 201-500

Salary: 123k-184k USD

Full Time

Frontend Engineering

Germany only

Channel Partner Manager, DACH

Nebius

Employee count: 201-500

Full Time

Channel Sales

United States only

Senior Technical Program Manager - Data Center Operations

Nebius

Employee count: 201-500

Salary: 110k-204k USD

Full Time

Technical Program Management

FR, DE + 3 more

Senior Sales Strategy & Operations Specialist (EMEA region)

Nebius

Employee count: 201-500

Full Time

Sales Operations

United States only

Developer Advocate - AI cloud

Nebius

Employee count: 201-500

Salary: 220k-300k USD

Full Time

Developer Advocacy

Netherlands only

NetSRE

Nebius

Employee count: 201-500

Full Time

Site Reliability Engineering

Top remote companies

Remote companies like Nebius

Find your next opportunity by exploring profiles of companies that are similar to Nebius. Compare culture, benefits, and job openings on Himalayas.

View all companies

Lambda

Benefits Tech stack

Lambda Labs is an AI infrastructure company providing GPU cloud services, servers, and workstations designed to accelerate deep learning and machine learning processes.

Developer Tools Edge AI

CO1 job

CoreWeave

Benefits Tech stack

CoreWeave is a specialized AI cloud provider delivering a massive scale of GPU compute resources on the industry's fastest and most flexible infrastructure, purpose-built for AI, machine learning, and VFX rendering workloads.

DevOps and Kubernetes AI Infrastructure as a Service (IaaS)

AE1 job

Aethir

Tech stack

Aethir is a decentralized cloud infrastructure (DCI) provider focused on delivering enterprise-grade GPU-as-a-Service for AI and cloud gaming applications.

Decentralized Cloud Infrastructure GPU Computing

TE1 job

TensorWave

TensorWave is a pioneering AI-focused cloud platform that leverages AMD's MI300X accelerators, enabling organizations to optimize AI workloads with enhanced performance and lower costs.

Cloud Computing Artificial Intelligence

DA2 jobs

DataCrunch

DataCrunch is a cloud service provider specializing in high-performance GPU servers and clusters for machine learning, powered by renewable energy.

Cloud Computing Artificial Intelligence

IN5 jobs

InfraCloud

Benefits Tech stack

InfraCloud Technologies provides cutting-edge cloud-native solutions, specializing in AI cloud infrastructure and GPU enablement.

Cloud Computing AI Infrastructure

Top remote companies

Remote companies like Nebius

Find your next opportunity by exploring profiles of companies that are similar to Nebius. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Senior HPC Cluster Engineer

The role

In this position, you will be responsible for:

We expect you to have:

It would be a plus if you have:

What we offer

Apply now

About the job

Apply before

Posted on

Job type

Experience level

Experience

Location requirements

Hiring timezones

Job categories

Skills

Browse similar jobs

About Nebius

Apply now

About the job

Apply before

Posted on

Job type

Experience level

Experience

Location requirements

Hiring timezones

Job categories

Skills

Browse similar jobs

Nebius

Company size

Markets

Employees live in

Similar remote jobs

Senior DevOps Engineer

Software Engineer - OpenStack

Software Engineer, Ceph & Distributed Storage

Software Engineer - Data Infrastructure

Cloud Infrastructure Engineer (Kineto)

Senior Software Engineer - Python/Golang - Kubernetes

122 remote jobs at Nebius

Frontend Engineer - User Interface

Channel Partner Manager, DACH

Senior Technical Program Manager - Data Center Operations

Senior Sales Strategy & Operations Specialist (EMEA region)

Developer Advocate - AI cloud

NetSRE

Remote companies like Nebius

Remote companies like Nebius

Find your dream job

Find your dream job

Apply now

Apply now

Senior DevOps Engineer

Software Engineer - OpenStack

Software Engineer, Ceph & Distributed Storage

Software Engineer - Data Infrastructure

Cloud Infrastructure Engineer (Kineto)

Senior Software Engineer - Python/Golang - Kubernetes

Frontend Engineer - User Interface

Channel Partner Manager, DACH

Senior Technical Program Manager - Data Center Operations

Senior Sales Strategy & Operations Specialist (EMEA region)

Developer Advocate - AI cloud

NetSRE

Remote companies like Nebius

Find your dream job