Mercor is seeking an AI Model Evaluator to evaluate LLM-generated responses for accuracy and consistency. The ideal candidate will have a background in computer science and experience in software engineering, as well as expertise in multiple programming languages. The role involves evaluating complex technical reasoning and identifying subtle bugs or logical flaws.
Requirements
- BS, MS, or PhD in Computer Science or a closely related field
- 5+ years of real-world experience in software engineering or related technical roles
- Expertise in at least two relevant programming languages (e.g., Python, Java, C++, JavaScript, Go, Rust, Ruby, SQL, Powershell, Bash, Swift, Kotlin, R, TypeScript, HTML/CSS)
- Ability to solve HackerRank or LeetCode Medium and Hard–level problems independently
- Experience contributing to well-known open-source projects, including merged pull requests
- Significant experience using LLMs while coding and understanding their strengths and failure modes
Benefits
- Competitive hourly rate of $60-$100
