We're looking for a Freelance Agent Evaluation Engineer to create challenging tasks and evaluation criteria for AI coding agents. The work involves building virtual companies, assembling and calibrating tasks, and designing tests that accept correct solutions and reject incorrect ones.
Requirements
- Degree in Computer Science, Software Engineering, or related fields
- 5+ years in software development, primarily Python
- Background in full-stack development
- Experience writing tests (functional, integration)
- Docker containers and familiarity with infrastructure tools
- CI/CD understanding (GitHub Actions)
- English proficiency - B2
Benefits
- Up to $45 per hour equivalent
- Flexible work schedule
- Opportunity to work on AI-related projects
