Lead Systems HPC Engineer

Nebius is a cutting-edge AI cloud platform that offers scalable infrastructure for developing and deploying AI solutions.

Nebius

Employee count: 201-500

United States only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

Why work at Nebius Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without massive infrastructure costs or the need to build large in-house AI/ML teams. Our employees work at the cutting edge of AI cloud infrastructure alongside some of the most experienced and innovative leaders and engineers in the field.

Where we workHeadquartered in Amsterdam and listed on Nasdaq, Nebius has a global footprint with R&D hubs across Europe, North America, and Israel. The team of over 1400 employees includes more than 400 highly skilled engineers with deep expertise across hardware and software engineering, as well as an in-house AI R&D team.

We are looking for a Lead Systems HPC Engineer to play a key role in building our hyperscaler platform, working across its core components while analyzing and optimizing the performance of large-scale GPU clusters at the intersection of hardware and software.

You will operate across the full stack—from hardware and system software to networking (InfiniBand/RoCE), virtualization (KVM/QEMU), and distributed communication layers (e.g., MPI, NCCL).

In this role you will

Focus on understanding system behavior across multiple layers, identifying performance bottlenecks, and driving improvements that shape how our clusters are built, operated, tuned, and validated.
Investigate and troubleshoot performance issues of GPU cluster under real workloads (training and inference)
Evaluate and integrate new hardware, system configurations and tuning approaches through software stack
Support complex performance-related escalations from internal teams and customers
Work closely with infrastructure, software engineering and hardware vendor teams (e.g. NVIDIA, Mellanox, Intel)
Contribute to hardware and cluster qualification (acceptance), ensuring systems meet performance expectations

We expect you to have:

5+ years of professional experience in system-level software development (focused on performance optimization, low-level programming). 
3+ years of hands-on experience with Linux systems (administration, troubleshooting, and performance tuning). 
In-depth understanding of server architecture, including PCIe devices, NICs, Linux OS/Kernel, and high-performance computing (HPC) systems. 
Strong proficiency in one or more performance-oriented programming languages (C/C++, Go, Python).

We conduct coding interviews as part of the process.

Key employee benefits:

Health insurance: 100% company-paid medical, dental and vision coverage for employees and families.
401(k) plan: Up to 4% company match with immediate vesting.
Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.
Remote work reimbursement: Up to $85/month for mobile and internet.
Disability & life insurance: Company-paid short-term, long-term and life insurance coverage.

Compensation

We offer competitive salaries ranging from $170k-$300k OTE + equity based on your experience.

What we offer

Competitive salary and comprehensive benefits package.
Opportunities for professional growth within Nebius.
Flexible working arrangements.
A dynamic and collaborative work environment that values initiative and innovation.

We’re growing and expanding our products every day. If you’re up to the challenge and are excited about AI and ML as much as we are, join us!

Apply now

Please let Nebius know you found this job on Himalayas. This helps us grow!

Apply now

About the job

Apply before

Jun 22, 2026

Posted on

Apr 23, 2026

Hiring timezones

United States +/- 0 hours

Job categories

System Engineers

Skills

Dynamic KVM Linux QEMU Transform Remote

Browse similar jobs

Remote Senior System-Engineers Jobs Remote Full Time System-Engineers Jobs Remote Senior System-Engineers Jobs in United States Remote Full Time Jobs in United States Remote System-Engineers Jobs in United States

About Nebius

Learn more about Nebius and their company culture.

View company profile

At Nebius, we offer an advanced AI cloud platform designed for those who wish to develop, tune, and deploy their AI models with the most efficient infrastructure available. Our platform utilizes cutting-edge NVIDIA GPU clusters, including the H100 and H200, optimized for maximum performance with InfiniBand. One of the standout features of Nebius is our comprehensive fine-tuning ecosystem that includes on-demand GPUs and tools necessary for robust dataset processing, ensuring that AI teams can efficiently manage their computational resources according to demand.

We recognize the importance of AI inference in deploying real-world applications. Hence, we provide a resilient and cost-effective infrastructure that has been optimized for rapid deployment of Generative AI applications. Our services span the entire lifecycle of AI solutions, from model training to inference, making Nebius not just a GPU cloud but a full-stack AI platform. Additionally, we pride ourselves on supporting our clients with 24/7 expert guidance, offering resources to help architects and engineers harness our AI-optimized data centers to build scalable solutions.

Apply now

Please let Nebius know you found this job on Himalayas. This helps us grow!

Apply now

About the job

Apply before

Jun 22, 2026

Posted on

Apr 23, 2026

Job type

Full Time

Experience level

Senior

Location requirements

United States

Hiring timezones

United States +/- 0 hours

Job categories

System Engineers

Skills

Dynamic KVM Linux QEMU Transform Remote

Browse similar jobs

Claim this profile

Nebius

Company size

201-500 employees

Markets

AI Cloud Computing Artificial Intelligence Machine Learning Infrastructure GPU Cloud Services Generative AI High Performance Computing Cloud Infrastructure Deep Learning Platforms AI Model Training Data Center Services

Employees live in

Netherlands

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

United States only

Senior Software Engineer

Ocient

Salary: 145k-190k USD

Full Time

Software Engineering

United States only

Systems Engineer II

NRTC

Employee count: 201-500

Full Time

Systems Engineering

United States only

HPC Engineer

Berkeley Square IT

Employee count: 11-50

Contractor

HPC Engineer

United States only

Engineer - HPC Platform

Xenon7

Employee count: 11-50

Full Time

HPC Platform Engineer

United States only

Sr Systems Engineer, DSP

Basis

Employee count: 51-200

Salary: 118k-184k USD

Full Time

Systems Engineering

United States only

Senior Systems Design Engineer (Linux)

Red Hat

Employee count: 5000+

Salary: 124k-199k USD

Full Time

Linux System Administration

146 remote jobs at Nebius

Explore the variety of open remote roles at Nebius, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Nebius

CA, GB + 1 more

Senior Technical Product Manager, Token Factory

Nebius

Employee count: 201-500

Full Time

Technical Product Management

United States only

Senior Data Center Deployment Engineer

Nebius

Employee count: 201-500

Salary: 125k-180k USD

Full Time

Data Center Engineer

France only

Regional Sales Director - France

Nebius

Employee count: 201-500

Full Time

Sales Management

United States only

Startup Community Manager – Systems & Experiences

Nebius

Employee count: 201-500

Salary: 165k-205k USD

Full Time

GTM

CZ, FI + 3 more

Product Manager - Security

Nebius

Employee count: 201-500

Full Time

Technology Product

CZ, DE + 4 more

Senior Software Engineer (Token Factory)

Nebius

Employee count: 201-500

Full Time

Backend Development

Top remote companies

Remote companies like Nebius

Find your next opportunity by exploring profiles of companies that are similar to Nebius. Compare culture, benefits, and job openings on Himalayas.

View all companies

Lambda

Benefits Tech stack

Lambda Labs is an AI infrastructure company providing GPU cloud services, servers, and workstations designed to accelerate deep learning and machine learning processes.

Developer Tools Edge AI

CO2 jobs

CoreWeave

Salaries Benefits Tech stack

AE1 job

Aethir

Tech stack

Aethir is a decentralized cloud infrastructure (DCI) provider focused on delivering enterprise-grade GPU-as-a-Service for AI and cloud gaming applications.

Decentralized Cloud Infrastructure GPU Computing

TE1 job

TensorWave

TensorWave is a pioneering AI-focused cloud platform that leverages AMD's MI300X accelerators, enabling organizations to optimize AI workloads with enhanced performance and lower costs.

Cloud Computing Artificial Intelligence

DA2 jobs

DataCrunch

DataCrunch is a cloud service provider specializing in high-performance GPU servers and clusters for machine learning, powered by renewable energy.

Cloud Computing Artificial Intelligence

IN7 jobs

InfraCloud

Benefits Tech stack

InfraCloud Technologies provides cutting-edge cloud-native solutions, specializing in AI cloud infrastructure and GPU enablement.

Cloud Computing

Top remote companies

Remote companies like Nebius

Find your next opportunity by exploring profiles of companies that are similar to Nebius. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!