Fathom is hiring an AI Engineer - Model Performance to own the speed, cost, and reliability of their model inference stack and build the fine-tuning infrastructure that makes the rest of the AI team faster.
Requirements
- Deep experience with LLM serving frameworks (vLLM, SGLang, TensorRT-LLM, or similar) — not just deploying them, but tuning them: attention backends, scheduling strategies, CUDA graph warmup, prefix caching
- Hands-on quantization experience — you've gone beyond
- apply FP8 and hope.
Benefits
- Competitive compensation
- Dynamic and collaborative engineering team
- Supportive environment that encourages innovation and personal growth
- Opportunity for impact
- Startup experience
