Skip to main content
Khaal UserKU
Open to opportunities

Khaal User

@khaaluser

Senior AI systems engineer specializing in adversarial LLM evaluation, deterministic rubrics, and code audits.

United States
Message

What I'm looking for

I’m looking for a team building frontier LLM systems where evaluation, reliability, and security are first-class—using deterministic rubrics, adversarial testing, and formal verification to ship trustworthy agents and code.

I’m a Senior AI Training & Evaluation Specialist and AI Systems Engineer with 7+ years of experience designing adversarial test cases, constructing deterministic evaluation rubrics, and performing surgical code audits for frontier LLM training pipelines.

I deeply specialize in breaking AI reasoning pathways through systematic edge-case construction, repo-wide code evaluation, and multi-step agentic workflow validation—backed by rigorous rubric architecture and measurable agreement (94% peer reviewer alignment on complex RLHF tasks).

Across multiple contracts, I’ve built deterministic evaluation environments (including Docker + Python-based verifiers), created RLHF/DPO/GRPO-oriented datasets, and validated proofs using formal methods like Lean and TLA+—with a strong focus on correctness, security, and reliability under real-world constraints.

I’m equally comfortable hardening systems: detecting prompt injection and data poisoning, auditing tool-calling and API trajectories, and enforcing strict typing and JSON schema compliance (Pydantic) so agents can’t take unsafe actions without the right checks. I thrive where engineering meets formal evaluation, and where quality is proven—not assumed.

Experience

Work history, roles, and key accomplishments

DS
Current

AI Training & Evaluation Specialist

Datacurve AI Shipd

Dec 2025 - Present (6 months)

Architected deterministic evaluation environments and built high-fidelity RLHF datasets for frontier LLMs across software engineering, finance, mathematics, and Web3. Evaluated 200+ outputs monthly and maintained 94% peer-review agreement while designing adversarial “Quests,” strengthening code QA, and optimizing algorithmic complexity.

AE

Lead Architect & Developer

Aegis-Market

Feb 2023 - Mar 2026 (3 years 1 month)

Architected an autonomous multi-agent arbitrage and negotiation engine orchestrated via FastAPI with strict deterministic validation gates. Built Pydantic-enforced JSON schema pipelines and a cost-controlled RAG foundation, and implemented Docker isolation to prevent unverified agents from making autonomous state changes without human approval.

SL

RLHF Engineer & Evaluator

Stealth Labs

Feb 2025 - Aug 2025 (6 months)

Built deterministic Docker-based evaluation environments and Python AST verifiers to assess frontier AI model reliability. Validated 200+ Lean-based mathematical proofs, generated edge-case training datasets, and improved evaluation fidelity by catching failures between “syntactic proofs” and solutions to the stated engineering problem.

MO

Senior Full Stack Software Engineer

Moovx

Nov 2023 - Jan 2025 (1 year 2 months)

Designed scalable web applications with React/Vue frontends and secure object-oriented backend APIs handling 100K+ daily requests. Led strangler-fig cloud migrations and implemented Terraform/Kubernetes CI/CD with blue-green rollouts for zero-downtime deployments, plus stream processing/data orchestration for 10M+ volatile payloads daily.

FS

Data Infrastructure & Systems Analyst

Freelance Tech Solutions

Mar 2017 - Dec 2020 (3 years 9 months)

Built and maintained distributed systems and backend workflows for enterprise data integrations using Python, with end-to-end feature development from PostgreSQL schema design through UI and state management. Integrated PyTest/unit testing and Git into the delivery lifecycle, reducing post-release hotfixes by 67%, and created automated data quality verifiers to reduce silent production failures.

Education

Degrees, certifications, and relevant coursework

University of California, Berkeley logoUB

University of California, Berkeley

Bachelor of Science in Computer Science, Computer Science

Activities and societies: Competitive programming (LeetCode 200+ problems; active Codeforces participant focused on graph algorithms, DP, and optimization). Active contributor to MLOps Community (Slack) and Hugging Face architecture forums; maintains the Logic-Trace-Py open-source repository.

Earned a Bachelor of Science in Computer Science from the University of California, Berkeley. Coursework included advanced data structures & algorithms, distributed systems, network security, database systems, and cryptography.

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan