Description
We’re looking for a QA Manual & Automation Engineer to ensure the quality, correctness, and reliability of our LLM-based conversational product—from UI flows and data connectors to reasoning and response validation. You’ll design test strategies that catch regressions in conversation behavior, verify data-grounded answers against datasets, and build automation to scale coverage.
This role blends manual QA, automation engineering, and AI conversational validation.
Responsibilities
- Own end-to-end QA for an LLM-driven conversational system: functional, regression, and exploratory testing.
- Build and maintain automated test suites in Python, with a strong focus on Playwright for UI and workflow automation.
- Design and execute AI conversational tests:
- Create structured test prompts and multi-turn scenarios
- Run queries over datasets and validate correctness of responses against expected ground truth
- Detect and document hallucinations, inconsistencies, and reasoning failures
- Develop test scenarios to validate data correctness and reasoning validity, including edge cases and adversarial prompts (within product scope).
- Write and validate SQL queries to confirm the correctness of underlying data and outputs.
- Validate integrations and storage across MongoDB, MySQL, and SQLite.
- Define and improve QA processes: test plans, defect triage, release sign-off criteria, and reporting dashboards.
- Collaborate closely with Product, Engineering, and ML teams to improve testability, reliability, and release confidence.
Requirements
- Strong Python programming skills for test automation and tooling.
- Hands-on experience with Playwright for automated testing.
- Experience testing AI conversational systems, including:
- Designing prompt suites over datasets
- Verifying correctness/consistency of responses
- Validating multi-turn behavior and context retention
- Strong manual QA ability for LLM-based conversational flows (UX + functional correctness).
- Ability to write and validate SQL queries for verification and debugging.
- Experience with databases: MongoDB, MySQL, SQLite.
- Strong test scenario design skills, especially for data correctness and reasoning validation.
