Why Blue Coding?
What are we looking for?
In this opportunity, we are looking for a Data Science/ML Engineer to work with one of our foreign clients, a company that's a leading provider of AI evaluation and optimization solutions, and to detect performance issues in large language models.
What’s unique about this job?
- In this role, you’ll help develop advanced reinforcement learning (RL) environments and scalable evaluation systems that guide and shape the behavior of cutting-edge AI models.
Here are some of the exciting day-to-day challenges you will face in this role:
- Design and implement RL environments that support large-scale agent evaluation and reinforcement learning experiments.
- Build task generation pipelines, dynamic datasets, and scripted environments with controlled complexity and stochasticity.
- Develop verifiers and reward models to score trajectories and evaluate model reasoning automatically.
- Collaborate with infrastructure and systems engineers to ensure environments are scalable, reproducible, and instrumented for detailed telemetry.
- Design APIs and orchestration frameworks for running, resetting, and evaluating agents across environments.
- Optimize environment performance, logging, and reward reproducibility across distributed setups.
You will shine if you have:
- Strong experience in Python software engineering.
- Minimum 3 years in a Data Scientist, Machine Learning/Environment Engineering position or similar
- Ability to work from 6 AM - 2 PM Pacific time.
- Bachelor's degree in Computer Science or related field
- Practical knowledge of AI frameworks (Langchain, Langraph, mcp-server ).
- Extensive practical experience in working with AI, including prompt engineering
and vibe coding.
It doesn’t hurt if you also have:
- Knowledge of Codex or Claude Code.
- Experience in integrating AI with a system would be an advantage.
- Understanding of Reinforced Learning concepts - reward modeling, environment dynamics, verifiability, evaluation, and agent interaction loops.
- Familiarity with instrumentation, metrics, and data pipelines for RL evaluation.
- Expertise in planning your own work
Here are some of the perks we offer you:
- Salary in USD
- 100% Remote
