This is a remote position.
Location: Currently remote; may transition to onsite in the futureAbout the RoleTraditional QA doesn’t work for AI.We’re looking for aAI QA Engineerwho understands that AI systems areprobabilistic, non-deterministic, and failure-prone by nature— and knows how to test them anyway.You will ownAI quality, safety, reliability, and regression testingacross agentic systems and AI-powered SaaS products.What You’ll Be Doing
AI-Specific Testing & Validation- Design and executeAI-specific test strategies, including:
- Prompt robustness testing
- Hallucination detection
- Output consistency checks
- Edge-case and adversarial testing
- ValidateRAG pipelines:
- Retrieval accuracy
- Context relevance
- Response grounding
- Test multi-agent workflows and tool integrations
Automation & Tooling- Buildautomated AI test harnesses
- Create evaluation pipelines for AI responses
- Define regression tests for prompt and agent changes
- Integrate AI testing into CI/CD pipelines
Collaboration & Quality Ownership- Work closely with AI developers during design, not just after implementation
- Help define acceptance criteria for AI features
- Monitor AI behavior in production and flag drift or degradation
- Document AI failure modes and mitigations
Requirements
Must Have- 4+ years of QA or testing experience
- Strong API testing and automation skills
- Understanding of AI/LLM behavior and limitations
- Experience testing non-deterministic systems
- Strong analytical and problem-solving mindset
Nice to Have- Experience testing AI agents or RAG systems
- Experience writing Python-based test automation
- Familiarity with LLM APIs and prompt engineering
- Experience with observability tools for AI systems
What You’ll Be Doing
AI-Specific Testing & Validation- Design and executeAI-specific test strategies, including:
- Prompt robustness testing
- Hallucination detection
- Output consistency checks
- Edge-case and adversarial testing
- ValidateRAG pipelines:
- Retrieval accuracy
- Context relevance
- Response grounding
- Test multi-agent workflows and tool integrations
- Prompt robustness testing
- Hallucination detection
- Output consistency checks
- Edge-case and adversarial testing
- Retrieval accuracy
- Context relevance
- Response grounding
