We're looking for a Freelance Agent Evaluation Engineer to create challenging tasks and evaluation criteria for AI coding agents. The tasks involve building virtual companies, assembling tasks from intermediate states, and designing tasks set in isolated environments. The ideal candidate has experience in software development, primarily in Python, with a background in full-stack development and experience writing tests. English proficiency - B2 is required.
Requirements
- Degree in Computer Science, Software Engineering, or related fields
- 5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations)
- Background in full-stack development, with experience building React-based interfaces (JavaScript/TypeScript) and robust back-end systems
- Experience writing tests (functional, integration — not just running them)
- Docker containers, and familiarity with infrastructure tools (Postgres, Kafka, Redis)
- CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
- English proficiency - B2
Benefits
- Up to $45 per hour equivalent
- Flexible work schedule
- Part-time, non-permanent project
