Machine Learning Engineer — Inference Optimization

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

About the Role

We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cutting-edge models into fast, reliable, and cost-efficient systems that serve real users.

This role is ideal for someone who enjoys deep technical work, profiling systems down to the kernel/GPU level, and translating research ideas into production-grade performance gains.

What You’ll Do

Optimize inference latency, throughput, and cost for large-scale ML models in production
Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)
Implement and tune techniques such as:
- Quantization (fp16, bf16, int8, fp8)
- KV-cache optimization & reuse
- Speculative decoding, batching, and streaming
- Model pruning or architectural simplifications for inference
Collaborate with research engineers to productionize new model architectures
Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)
Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups
Improve system reliability, observability, and cost efficiency under real workloads

What We’re Looking For

Strong experience in ML inference optimization or high-performance ML systems
Solid understanding of deep learning internals (attention, memory layout, compute graphs)
Hands-on experience with PyTorch (or similar) and model deployment
Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)
Experience scaling inference for real users (not just research benchmarks)
Comfortable working in fast-moving startup environments with ownership and ambiguity

Nice to Have

Experience with LLM or long-context model inference
Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)
Experience optimizing across different hardware vendors
Open-source contributions in ML systems or inference tooling
Background in distributed systems or low-latency services

Why Join Us

Real ownership over performance-critical systems
Direct impact on product reliability and unit economics
Close collaboration with research, infra, and product
Competitive compensation + meaningful equity at Series A
A team that cares about engineering quality, not hype

Apply now

Please let Featherless AI know you found this job on Himalayas. This helps us grow!

Apply now

About the job

Apply before

Mar 24, 2026

Posted on

Jan 23, 2026

Job type

Full Time

Experience level

Mid-level

Location requirements

Open to candidates from all countries.

Hiring timezones

Worldwide

Job categories

Machine Learning Engineer Mid Level AI Inference Engineer AI ML Engineer

Skills

PyTorch Triton CUDA ROCm TensorRT ONNX Runtime VLLM Model Quantization FP16 INT8 FP8 Speculative Decoding Batch Bf16

About Featherless AI

Learn more about Featherless AI and their company culture.

View company profile

Apply now

Please let Featherless AI know you found this job on Himalayas. This helps us grow!

Apply now

About the job

Apply before

Mar 24, 2026

Posted on

Jan 23, 2026

Job type

Full Time

Experience level

Mid-level

Location requirements

Open to candidates from all countries.

Hiring timezones

Worldwide

Job categories

Machine Learning Engineer Mid Level AI Inference Engineer AI ML Engineer

Skills

PyTorch Triton CUDA ROCm TensorRT ONNX Runtime VLLM Model Quantization FP16 INT8 FP8 Speculative Decoding Batch Bf16

Claim this profile FA

Featherless AI

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

AI/ML Engineer (all)

XO Life GmbH

Full Time

Sr Lead Machine Learning Engineer

Upwork

Employee count: 201-500

Salary: 195k-308k USD

Full Time

Machine Learning

Member of Technical Staff, Training Engineer (Large Scale Foundation Models)

FirstPrinciples

Employee count: 1-10

Full Time

First Principles Foundation

[WBL] AI Research Engineer - Vision Language Model

Upstage

Employee count: 51-200

Contractor

AI Research & Engineering

Software Engineer - Model Serving

Upstage

Employee count: 51-200

Contractor

Software Engineering

Applied AI Engineer - Agent GYM

Upstage

Employee count: 51-200

Contractor

Software Engineering

17 remote jobs at Featherless AI

Explore the variety of open remote roles at Featherless AI, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Featherless AI

United States only

Developer Relations Associate/Intern (Partnerships) Boston-Based

Featherless AI

Salary: 52k-120k USD

Full Time

Developer Relations Partnerships Intern

AI Researcher — Distillation

Featherless AI

Full Time

AI Researcher

AI Researcher – Multilingual Data

Featherless AI

Full Time

AI Researcher

AI Researcher — Training Optimization

Featherless AI

Full Time

AI Researcher

AI Researcher — AI Architecture Research

Featherless AI

Full Time

AI Researcher

AI Researcher — Inference Optimization

Featherless AI

Full Time

AI Researcher

Top remote companies

Remote companies like Featherless AI

Find your next opportunity by exploring profiles of companies that are similar to Featherless AI. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Machine Learning Engineer — Inference Optimization

About the Role

What You’ll Do

Implement and tune techniques such as:

Quantization (fp16, bf16, int8, fp8)

KV-cache optimization & reuse

What We’re Looking For

Nice to Have

Why Join Us

Apply now

About the job

Apply before

Posted on

Job type

Experience level

Location requirements

Hiring timezones

Job categories

Skills

About Featherless AI

Apply now

About the job

Apply before

Posted on

Job type

Experience level

Location requirements

Hiring timezones

Job categories

Skills

Featherless AI

Similar remote jobs

AI/ML Engineer (all)

Sr Lead Machine Learning Engineer

Member of Technical Staff, Training Engineer (Large Scale Foundation Models)

[WBL] AI Research Engineer - Vision Language Model

Software Engineer - Model Serving

Applied AI Engineer - Agent GYM

17 remote jobs at Featherless AI

Developer Relations Associate/Intern (Partnerships) Boston-Based

AI Researcher — Distillation

AI Researcher – Multilingual Data

AI Researcher — Training Optimization

AI Researcher — AI Architecture Research

AI Researcher — Inference Optimization

Remote companies like Featherless AI

Find your dream job

Find your dream job

Find your dream job

Developer Relations Associate/Intern (Partnerships) Boston-Based

AI Researcher — Distillation

AI Researcher – Multilingual Data

AI Researcher — Training Optimization

AI Researcher — AI Architecture Research

AI Researcher — Inference Optimization

AI/ML Engineer (all)

Sr Lead Machine Learning Engineer

Member of Technical Staff, Training Engineer (Large Scale Foundation Models)

[WBL] AI Research Engineer - Vision Language Model

Software Engineer - Model Serving

Applied AI Engineer - Agent GYM