HimalayasHimalayas logo
CIQCI

Senior/Principal AI Performance Engineer

CIQ, Inc. is an innovative leader in high-performance computing, dedicated to open-source solutions and known for its flagship, Rocky Linux, serving as a reliable alternative to CentOS.

CIQ

Employee count: 51-200

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

CIQ OVERVIEW

CIQ builds the enterprise infrastructure that powers the world's most demanding workloads. From the operating system layer through AI infrastructure, high-performance computing, and cloud-native orchestration, CIQ delivers the speed, security, scalability, and sovereignty that major enterprises, government agencies, and research institutions depend on.

CIQ is the founding support and services partner of Rocky Linux and the developer of the RLC Pro family of Enterprise Linux distributions, Fuzzball workload orchestration, Warewulf Pro cluster provisioning, and Ascender Pro automation. Our customers include some of the largest and most technically sophisticated organizations in the world, working across HPC, AI/ML, defense, and regulated industries.

We are a company of builders, operators, and open source practitioners. If you want to do work that matters, at a company that is genuinely changing how enterprise infrastructure gets built and run, we want to talk.

POSITION SUMMARY

CIQ is seeking a highly experienced Senior or Principal AI Engineer to own and drive AI/ML innovation across our product portfolio. This role sits at the intersection of AI engineering and systems performance - the right candidate brings deep expertise in model inference optimization, training workflows, and production AI deployment, combined with a strong instinct for performance at the systems level.

In this role, you will be the AI engineering standard-bearer at CIQ. You will design and build turnkey AI workload examples - both internal reference pipelines and customer-facing solutions - ensuring that CIQ’s AI story is always compelling, practical, and demonstrably best-in-class. You will integrate deeply with Fuzzball, CIQ’s cloud-native computing platform, running AI workloads end-to-end through it and helping customers do the same.

KEY RESPONSIBILITIES

This role is leveled as Senior or Principal based on qualifications and demonstrated capabilities.

AI Inference Optimization

  • Design, implement, and tune inference pipelines for large language models and other AI workloads, targeting maximum throughput and minimum latency.
  • Apply state-of-the-art optimization techniques: quantization (INT4/INT8/FP8), model pruning, speculative decoding, continuous batching, and kernel fusion.
  • Optimize inference-serving stacks, including vLLM, TensorRT-LLM, ONNX Runtime, and similar frameworks, for production deployment on CIQ’s OS platform.
  • Profile and tune GPU/accelerator utilization across the full inference stack, from model weights and memory bandwidth to CUDA kernels and driver overhead.
  • Establish inference performance baselines and regression detection across CIQ’s AI-focused solutions.

AI Training Workflows

  • Design and optimize distributed training pipelines for large-scale models, including data, model, tensor, and pipeline parallelism strategies.
  • Tune training efficiency through mixed-precision training, gradient checkpointing, activation recomputation, and optimizer-level improvements.
  • Benchmark training throughput and scaling efficiency across multi-GPU and multi-node configurations on CIQ’s infrastructure.
  • Collaborate with infrastructure and performance teams to resolve training bottlenecks at the network (RDMA/InfiniBand), storage, and OS layers.
  • Stay current on frontier model architectures and training techniques, including MoE models, RLHF pipelines, and emerging post-training methods.

Turn-Key AI Examples & Reference Workloads

  • Build and maintain a library of turn-key AI workload examples that run on CIQ’s platform, covering inference serving, fine-tuning, batch processing, RAG pipelines, and agentic workflows.
  • Develop both internal reference pipelines for CI/testing and customer-facing examples designed for immediate productivity on CIQ’s OS and Fuzzball.
  • Package workloads using containers to deliver portable, reproducible AI environments across HPC and cloud-native settings.
  • Create compelling, well-documented demos and reference architectures that communicate CIQ’s AI capabilities to technical and business audiences alike.
  • Partner with product and customer success teams to translate real-world AI use cases into reusable, production-quality examples.

AI Engineering & Tooling

  • Build and maintain AI-powered engineering tooling - leveraging LLM-based agents, automated analysis pipelines, and AI-assisted code generation to accelerate the broader engineering organization.
  • Champion an AI-first development culture: identify opportunities where AI tooling can reduce toil, surface insights faster, and improve software quality across CIQ’s products.
  • Evaluate and integrate emerging AI frameworks, libraries, and hardware as they become relevant to CIQ’s customers and product roadmap.
  • Contribute to open-source AI tooling and frameworks where relevant, reinforcing CIQ’s technical reputation in the community.

Fuzzball Integration

  • Develop deep expertise in CIQ’s Fuzzball platform, its architecture, scheduling model, and workload execution environment.
  • Integrate AI training, inference, and pipeline workloads into Fuzzball-based CI/CD and production pipelines.
  • Contribute to Fuzzball’s AI workload story: ensure the platform is a first-class environment for running AI workloads efficiently and at scale.
  • Help characterize and improve Fuzzball’s performance for AI-specific access patterns and resource demands.

Cross-Functional Collaboration

  • Develop broad familiarity with the full CIQ product portfolio, including Rocky Linux and RLC (and its variants), Fuzzball, Apptainer, and Warewulf, and understand how AI workloads interact with each layer.
  • Collaborate closely with the Performance Engineering team to ensure AI workloads benefit from and contribute to CIQ’s systems-level optimization work.
  • Partner with product and customer success teams to translate real-world AI pain points into engineering priorities and measurable outcomes.
  • Document and communicate findings clearly, from low-level profiling data to executive-level summaries.
  • Contribute to technical publications, conference presentations, and thought leadership that reinforces CIQ’s reputation as an AI-forward infrastructure company.

NEEDED TO SUCCEED

Successful candidates will have:

  • Deep, hands-on expertise in LLM inference optimization: including serving frameworks (vLLM, TensorRT-LLM, ONNX Runtime), quantization techniques, and GPU memory management.
  • Strong background in distributed AI training, including frameworks such as PyTorch FSDP, DeepSpeed, Megatron-LM, or JAX/XLA.
  • Proven experience building production AI pipelines and packaging AI environments for reproducible, portable deployment (containers, Apptainer/Singularity, or equivalent).
  • Fluency with GPU/accelerator profiling tools: NVIDIA Nsight, PyTorch Profiler, CUDA performance analysis, and related tooling.
  • Familiarity with HPC environments: job schedulers (Slurm, PBS), parallel filesystems, RDMA/InfiniBand, and MPI, and the intersection of HPC with modern AI workloads.
  • Experience integrating AI workloads into CI/CD pipelines and building automated testing and benchmarking frameworks.
  • Comfort using and building with LLM-based tools and agentic frameworks to accelerate engineering work.
  • Excellent analytical skills and able to form hypotheses, design experiments, and draw actionable conclusions from complex profiling data.
  • Strong written and verbal communication skills; able to present findings to both deeply technical audiences and business stakeholders.
  • A collaborative, humble, and always-learning mindset, combined with the confidence to champion AI engineering as a first-class concern.

EDUCATION AND EXPERIENCE

  • PhD in Computer Science, Machine Learning, Computer Engineering, or a related field strongly preferred; equivalent industry experience considered.
  • 10+ years of industry experience in AI/ML engineering, systems software, or a closely related discipline.
  • Demonstrated track record of measurable, published, or production-deployed AI performance improvements at scale.
  • Experience working in or with open-source AI ecosystems (PyTorch, Triton, ONNX, Hugging Face, etc.) is a strong plus.
  • Background with cloud-native, containerized, and/or HPC computing environments preferred.

BENEFITS

  • Medical, dental, and vision insurance.

  • Flexible paid time off.

  • Employee stock options.

  • Remote work; no travel required for most positions.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Education

Postgraduate degree

Experience

10 years minimum

Experience accepted in place of education

Location requirements

Open to candidates from all countries.

Hiring timezones

Worldwide

About CIQ

Learn more about CIQ and their company culture.

View company profile

CIQ, Inc. is at the forefront of reimagining software infrastructure with an innovative approach centered around Rocky Linux, which serves as the foundation for its product offerings. Founded on April 1, 2020, CIQ emerged in response to the discontinuation of CentOS, stepping up to fill a crucial void by funding, facilitating, and leading the creation of Rocky Linux. The company’s dedication to open-source values is reflected not only in its products but also in its commitment to supporting the community of users who rely on enterprise Linux solutions.

CIQ provides a comprehensive suite of tools and services tailored for high-performance computing (HPC), ensuring that enterprises have the infrastructure needed to meet modern demands. With solutions like CIQ Ascender for automation and CIQ Fuzzball for HPC orchestration, CIQ aims to enhance the security and operational capabilities of organizations. The company stands out by offering exceptional enterprise support, leveraging a team of experts with deep knowledge in Linux and HPC. CIQ is more than just a tech provider; it positions itself as an extension of your team, ensuring that companies can navigate the complexities of IT infrastructure efficiently. As the tech landscape evolves, CIQ remains committed to innovation, collaboration, and delivering high-performance solutions to its partners and clients.

Claim this profileCIQ logoCI

CIQ

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

4 remote jobs at CIQ

Explore the variety of open remote roles at CIQ, offering flexible work options across multiple disciplines and skill levels.

View all jobs at CIQ

Remote companies like CIQ

Find your next opportunity by exploring profiles of companies that are similar to CIQ. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan