HimalayasHimalayas logo
FA

AI Researcher — Inference Optimization

Featherless AI
CA, FR + 6 more

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

Role Overview

We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models. You will work at the intersection of model architecture, systems engineering, and hardware-aware optimization, improving latency, throughput, and cost efficiency across real-world production environments.

Key Responsibilities

  • Research and develop techniques to optimize inference performance for large neural networks.

  • Improve latency, throughput, memory efficiency, and cost per inference.

  • Design and evaluate model-level optimizations (quantization, pruning, KV-cache optimization, architecture-aware simplifications).

  • Implement systems-level optimizations (dynamic batching, kernel fusion, multi-GPU inference, prefill vs decode optimization).

  • Benchmark inference workloads across hardware accelerators.

  • Collaborate with engineering teams to deploy optimized inference pipelines.

  • Translate research insights into production-ready improvements.

Required Qualifications

  • Strong background in machine learning, deep learning, or AI systems.

  • Hands-on experience optimizing inference for large-scale models.

  • Proficiency in Python and modern ML frameworks (e.g., PyTorch).

  • Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime).

  • Ability to design experiments and communicate results clearly.

Preferred / Nice-to-Have Qualifications

  • Experience deploying production inference systems at scale.

  • Familiarity with distributed and multi-GPU inference.

  • Experience contributing to open-source ML or inference frameworks.

  • Authorship or co-authorship of peer-reviewed research papers in machine learning, systems, or related fields.

  • Experience working close to hardware (CUDA, ROCm, profiling tools).

What Success Looks Like

  • Measurable gains in latency, throughput, and cost efficiency.

  • Optimized inference systems running reliably in production.

  • Research ideas successfully translated into deployable systems.

  • Clear benchmarks and documentation that inform product decisions.

Relevant Research Areas (Bonus)

  • Long-context inference optimization

  • Speculative decoding

  • KV-cache compression and paging

  • Efficient decoding strategies

  • Hardware-aware inference design

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Hiring timezones

United States +/- 0 hours, and 7 other timezones
Claim this profileFA

Featherless AI

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

19 remote jobs at Featherless AI

Explore the variety of open remote roles at Featherless AI, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Featherless AI

Remote companies like Featherless AI

Find your next opportunity by exploring profiles of companies that are similar to Featherless AI. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan