This is a remote position.
About the RoleTraditional QA doesn’t work for AI.We’re looking for a Senior AI QA Engineer who understands that AI systems are probabilistic, non-deterministic, and failure-prone by nature — and knows how to test them anyway.You will own AI quality, safety, reliability, and regression testing across agentic systems and AI-powered SaaS products.What You’ll Be Doing
AI-Specific Testing & Validation- Design and execute AI-specific test strategies, including:
- Prompt robustness testing
- Hallucination detection
- Output consistency checks
- Edge-case and adversarial testing
- Validate RAG pipelines:
- Retrieval accuracy
- Context relevance
- Response grounding
- Test multi-agent workflows and tool integrations
Automation & Tooling- Build automated AI test harnesses
- Create evaluation pipelines for AI responses
- Define regression tests for prompt and agent changes
- Integrate AI testing into CI/CD pipelines
Collaboration & Quality Ownership- Work closely with AI developers during design, not just after implementation
- Help define acceptance criteria for AI features
- Monitor AI behavior in production and flag drift or degradation
- Document AI failure modes and mitigations
Requirements
Must Have- 4+ years of QA or testing experience
- Strong API testing and automation skills
- Understanding of AI/LLM behavior and limitations
- Experience testing non-deterministic systems
- Strong analytical and problem-solving mindset
Nice to Have- Experience testing AI agents or RAG systems
- Experience writing Python-based test automation
- Familiarity with LLM APIs and prompt engineering
- Experience with observability tools for AI systems
What You’ll Be Doing
AI-Specific Testing & Validation- Design and execute AI-specific test strategies, including:
- Prompt robustness testing
- Hallucination detection
- Output consistency checks
- Edge-case and adversarial testing
- Validate RAG pipelines:
- Retrieval accuracy
- Context relevance
- Response grounding
- Test multi-agent workflows and tool integrations
- Prompt robustness testing
- Hallucination detection
- Output consistency checks
- Edge-case and adversarial testing
- Retrieval accuracy
- Context relevance
- Response grounding
