AI Inference Engineer plays a critical role in bridging the gap between high-performance model development and optimized deployment environments. This position focuses on optimizing Large Language Models (LLMs) for inference, serving diverse environments with a strong emphasis on maximizing throughput, minimizing latency, and maintaining model accuracy.
Requirements
- Programming Languages: Proficiency in programming languages such as Python, C++, Rust, or Golang specifically for high-performance AI workflows.
- Inference Tools: Proven hands-on experience with tools like vLLM, TensorRT, Llama.cpp, and Ollama for inference development and optimization.
- Infrastructure Expertise: Strong familiarity with infrastructure technologies, including Docker, Kubernetes, and cloud platforms such as AWS, GCP, and Azure.
- Hardware Optimization Expertise: Comprehensive understanding of GPU and AI hardware, including techniques for profiling and optimizing performance for accelerators like NVIDIA GPUs and TPUs.
Benefits
- Free meals and snacks
- Flexible work hours
- Professional development opportunities
