This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Machine Learning Systems Engineer in European Union.
We are seeking a talented Machine Learning Systems Engineer to join a remote-first, globally distributed team working on cutting-edge AI infrastructure. In this role, you will contribute to the development of large-scale language model systems, focusing on high-performance training, inference, and self-improving AI agents. You will work at the intersection of machine learning research, distributed systems, and high-performance computing, building tools and frameworks that enable researchers and organizations worldwide to deploy advanced AI solutions. This role offers the chance to work on technically demanding, open-source projects while collaborating with a passionate international team. Your work will have a direct impact on the future of scalable AI systems.
Accountabilities:
- Contribute to the development and optimization of large-scale language model frameworks.
- Implement high-performance distributed training algorithms using frameworks such as Megatron-LM, DeepSpeed, and vLLM.
- Develop and optimize inference engines and tools for model deployment, fine-tuning, and AI agent self-improvement.
- Integrate diverse machine learning ecosystems including HuggingFace and other LLM tools.
- Optimize performance across multi-GPU, multi-node architectures, leveraging HPC and CUDA/ROCm programming.
- Collaborate with the open-source community to enhance the codebase, implement features, and resolve issues.
- Research and implement advanced techniques for self-improving AI agents and high-efficiency ML pipelines.
Requirements
- 3+ years of experience in machine learning engineering or research.
- Proficiency in Python and C/C++, with strong systems programming skills.
- Deep understanding of high-performance computing concepts, including MPI, BSP, and distributed multi-GPU training.
- Solid experience with transformer architectures, gradient descent, backpropagation, and deep learning training.
- Familiarity with distributed training strategies: data parallelism, model parallelism, pipeline parallelism.
- Experience with containerization (Docker, Kubernetes) and cluster orchestration.
- Demonstrated experience with ML frameworks like vLLM, Megatron-LM, HuggingFace, or similar.
- Commitment to open-source development and community collaboration.
- Excellent problem-solving, debugging, and performance optimization skills.
- Bonus: Advanced degrees (MS/PhD), experience with SLURM, mixed-precision training, MLOps, or prior contributions to major open-source ML projects.
Benefits
- Competitive compensation including salary and equity participation.
- Fully remote, work-from-anywhere flexibility.
- Comprehensive global benefits including mental health support.
- Open PTO policy and flexible working hours.
- Paid parental leave and support for personal well-being.
- Opportunities for continuous learning and professional development.
- Regular team offsites, virtual events, and global gatherings to foster team collaboration.
- Inclusive, transparent, and supportive culture prioritizing growth and knowledge-sharing.
Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.
When you apply, your profile goes through our AI-powered screening process designed to identify top talent efficiently and fairly.
- 🔍 Our AI thoroughly analyzes your CV and LinkedIn profile, evaluating your skills, experience, and achievements.
- 📊 It compares your profile against the job’s core requirements and past success factors to calculate a match score.
- 🎯 The top 3 candidates with the highest match are automatically shortlisted.
- 🧠 When necessary, our human team may perform additional review to ensure no strong candidate is overlooked.
The process is transparent, skills-based, and unbiased, focusing solely on your fit for the role. Once the shortlist is completed, it is shared with the hiring company, who then determines next steps such as interviews or additional assessments.