This is a remote position.
Reporting to Manager, Quality Engineering & AI Validation, focuses on validating the quality of AI-generated outputs, agent behaviors, and AI-assisted workflows. Builds benchmark scenarios, defines scoring rubrics, evaluates business usefulness, and identifies failure patterns that conventional pass or fail software testing would not catch.
Key Responsibilities
AI Output Evaluation
- Design and execute structured evaluations for AI-enabled features and workflows.
- Assess outputs for groundedness, instruction adherence, consistency, usefulness, tone, control compliance, and risk.
- Identify hallucinations, unsupported assertions, missing logic, and unsafe recommendations.
Benchmark & Rubric Development
- Build and maintain golden datasets, benchmark prompts, comparison sets, and scorecards.
- Develop rubrics that allow quality to be measured consistently across releases and changes.
Workflow & Model Change Validation
- Compare performance across prompt versions, workflow revisions, tools, and models.
- Support release decisions with evidence on quality regression or improvement.
Business & Domain Partnership
- Work closely with Finance SMEs, product managers, and engineers to determine what acceptable looks like in real business contexts.
- Help define human-review thresholds and escalation patterns for higher-risk use cases.
Production Feedback
- Analyze reviewer feedback, override patterns, and live quality signals to improve evaluation coverage over time.
