FluidstackFL

Data Center Operations Manager

What started as an ambitious initiative in the tech sector has blossomed into Fluidstack, a premier AI cloud platform dedicated to providing unparalleled compute power for leading AI laboratories across the globe.

Fluidstack

Employee count: 51-200

United Kingdom only

About Fluidstack

Fluidstack is the AI Cloud Platform. We build GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.

Our team is highly motivated, and focused on providing a world class supercomputing experience. We put our customers first in everything we do, working hard to not just win the sale, but to win repeated business and customer referrals.

We hold ourselves and each other to high standards. We expect you to care deeply about the work you do, the products you build, and the experience our customers have in every interaction with us.

You must work hard, take ownership from inception to delivery, and approach every problem with an open mind and a positive attitude. We value effectiveness, competence, and a growth mindset.

About the Role

We’re looking for a Data Center Operations Manager to manage the ongoing operational performance of Fluidstack owned and operated GPU clusters. This is a “player-coach” role with both oversight and hands-on responsibilities, focused on ensuring the availability and performance of our data center infrastructure.

You’ll be the owner of everything that lives within our data centers, managing our Data Center Operations team as well as third parties, from installation through ongoing maintenance and coordinating upgrades. Your primary responsibility is to ensure the continuous and efficient operation of the data center by managing on-site technicians and third-party vendors, creating and maintaining operational procedures, diagnosing issues, and providing hands-on technical support when higher-level intervention is required. This role is ideal for individuals who excel in environments that demand both operational discipline and the ability to navigate complex, technical challenges.

Focus

  • Ensuring high availability of our GPU infrastructure.

  • Manage onsite team of data center technicians and third party vendors in daily operations, including server maintenance, equipment installation, and troubleshooting.

  • Respond to and resolve technical issues and emergencies in a timely manner, ensuring minimal downtime and disruption.

  • Act as interface between FDEs and onsite team to ensure fast, effective technical remediation and incident resolution.

  • Undertake regular data center maintenance,performing inspections and audits of equipment to maintain optimal performance and reliability.

  • Proactively manage infrastructure by defining and continuously improving standard operating procedures (SOPs) for routine data center maintenance.

  • Manage third-party hardware vendors, including initiating and coordinating the RMA process.

  • Available to travel to various locations in the US and Europe on short notice and potentially for extended periods when on-site support requires elevated, hands-on expertise.

About You

  • 5+ years experience in data center operations.

  • Proven ability to lead remote teams and manage vendors.

  • In-depth knowledge of data center infrastructure, including servers, networking equipment, and cooling systems.

  • Capable of training on-site datacenter technicians to perform routine physical maintenance.

  • Capable of remotely diagnosing hardware issues using common Linux and OOB utilities (dmesg, journalctl, dmidecode, lspci, mcelog, dcgmi, nvidia-smi, RedFish, IPMI, etc).

  • Familiar with common inventory management systems (e.g. NetBox).

  • Strong communication and organizational skills.

  • Willing to travel internationally on short notice, based onsite for extended periods as required.

Nice to haves

  • Strong troubleshooting skills and the ability to quickly diagnose and resolve technical issues

  • Experience with data center management tools and software

  • Strong time management, communication and interpersonal skills, with the ability to manage a team

Benefits

  • Competitive total compensation package (cash + equity).

  • Retirement or pension plan, in line with local norms.

  • Health, dental, and vision insurance.

  • Generous PTO policy, in line with local norms.

  • Fluidstack is remote first, but has offices in key hubs. For all other locations, we provide access to WeWork.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Manager

Location requirements

Hiring timezones

United Kingdom +/- 0 hours

About Fluidstack

Learn more about Fluidstack and their company culture.

View company profile

What started as an ambitious initiative in the tech sector has blossomed into Fluidstack, a premier AI cloud platform dedicated to providing unparalleled compute power for leading AI laboratories across the globe. Founded with a vision to democratize access to top-tier GPU resources, Fluidstack has quickly positioned itself as a trusted partner for companies requiring substantial computational resources for AI training and inference.

Fluidstack's offerings are centered around instant access to thousands of NVIDIA GPUs, including cutting-edge models such as the H100 and A100. Organizations can deploy large-scale GPU clusters that can exceed 4,096 GPUs, made possible through their fully managed infrastructure utilizing Slurm and Kubernetes. This deployment capability is complemented by impressive storage solutions, featuring over 1PB of shared storage and high-speed InfiniBand for optimal data handling. With a commitment to customer satisfaction, Fluidstack promises a remarkable 99% uptime and industry-leading 15-minute response times, making it an ideal choice for companies needing robust support while focusing on their groundbreaking AI projects. Trusted by major players in the AI sphere, Fluidstack continues to expand its services, launching new GPU instances and enhancing its infrastructure to meet the demanding needs of AI businesses worldwide.

Claim this profileFluidstack logoFL

Fluidstack

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

10 remote jobs at Fluidstack

Explore the variety of open remote roles at Fluidstack, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Fluidstack

Remote companies like Fluidstack

Find your next opportunity by exploring profiles of companies that are similar to Fluidstack. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 85,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan