We're building a dataset to evaluate AI coding agents by creating challenging tasks and evaluation criteria within realistic simulated environments. This part-time opportunity involves building virtual companies, assembling and calibrating tasks, and designing tests to evaluate AI agent performance.
Requirements
- Degree in Computer Science, Software Engineering, or related fields
- 5+ years in software development, primarily Python
- Background in full-stack development with experience building React-based interfaces and robust back-end systems
- Experience writing tests (functional, integration) and familiarity with Docker containers and infrastructure tools
Benefits
- Flexible part-time work
- Opportunity to work on challenging projects with leading tech companies
- Variety of projects to choose from with different scope, complexity, and required expertise
