FluidstackFL

Infrastructure Engineer (Compute)

What started as an ambitious initiative in the tech sector has blossomed into Fluidstack, a premier AI cloud platform dedicated to providing unparalleled compute power for leading AI laboratories across the globe.

Fluidstack

Employee count: 51-200

United States only

About FluidStack

Fluidstack is the AI Cloud Platform. We build GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.

Our team is small, highly motivated, and focused on providing a world class supercomputing experience. We put our customers first in everything we do, working hard to not just win the sale, but to win repeated business and customer referrals.

We hold ourselves and each other to high standards. We expect you to care deeply about the work you do, the products you build, and the experience our customers have in every interaction with us.

You must work hard, take ownership from inception to delivery, and approach every problem with an open mind and a positive attitude. We value effectiveness, competence, and a growth mindset.

About the Role

We are looking for an InfrastructureEngineer (Compute) to design, deploy, and manage the compute infrastructure powering Fluidstack's GPU clusters. You will be responsible for ensuring the performance, scalability, and reliability of our compute resources, working closely with hardware and software teams to support our AI workloads.

Focus

  • Design and implement GPU/ASIC infrastructure at the server, rack, and system level.

  • Troubleshoot complex GPU and compute system related failures.

  • Develop and maintain hardware/firmware management services.

  • Automate all aspects of the server lifecycle.

  • Own end-to-end compute lifecycle, including partnering with vendors on RMAs.

  • Serve as the main point of contact for hardware escalation and troubleshooting.

  • Monitor system performance, identifying and resolving bottlenecks.

  • Automate deployment and management tasks to improve efficiency.

  • Collaborate with storage and network teams to ensure cohesive infrastructure operations.

About You

  • 5+ years of experience in compute infrastructure engineering.

  • Strong knowledge of Linux systems administration and performance tuning.

  • Experience with bare metal provisioning tools (MaaS, Metal3, Tinkerbell, or other).

  • Familiarity with GPU hardware and workload optimization, especially kernel and driver level requirements.

  • Proficiency in automation tools (e.g., Ansible, Terraform).

  • Experience operating Kubernetes and SLURM clusters.

Benefits

  • Competitive total compensation package (salary + equity).

  • Retirement or pension plan, in line with local norms.

  • Health, dental, and vision insurance.

  • Generous PTO policy, in line with local norms.

  • Fluidstack is remote first, but has offices in key hubs. For all other locations, we provide access to WeWork.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Mid-level

Location requirements

Hiring timezones

United States +/- 0 hours

About Fluidstack

Learn more about Fluidstack and their company culture.

View company profile

What started as an ambitious initiative in the tech sector has blossomed into Fluidstack, a premier AI cloud platform dedicated to providing unparalleled compute power for leading AI laboratories across the globe. Founded with a vision to democratize access to top-tier GPU resources, Fluidstack has quickly positioned itself as a trusted partner for companies requiring substantial computational resources for AI training and inference.

Fluidstack's offerings are centered around instant access to thousands of NVIDIA GPUs, including cutting-edge models such as the H100 and A100. Organizations can deploy large-scale GPU clusters that can exceed 4,096 GPUs, made possible through their fully managed infrastructure utilizing Slurm and Kubernetes. This deployment capability is complemented by impressive storage solutions, featuring over 1PB of shared storage and high-speed InfiniBand for optimal data handling. With a commitment to customer satisfaction, Fluidstack promises a remarkable 99% uptime and industry-leading 15-minute response times, making it an ideal choice for companies needing robust support while focusing on their groundbreaking AI projects. Trusted by major players in the AI sphere, Fluidstack continues to expand its services, launching new GPU instances and enhancing its infrastructure to meet the demanding needs of AI businesses worldwide.

Claim this profileFluidstack logoFL

Fluidstack

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

10 remote jobs at Fluidstack

Explore the variety of open remote roles at Fluidstack, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Fluidstack

Remote companies like Fluidstack

Find your next opportunity by exploring profiles of companies that are similar to Fluidstack. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 85,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan