Open to opportunities

Khaal User

@khaaluser

Message

Senior AI systems engineer specializing in adversarial LLM evaluation, deterministic rubrics, and code audits.

United States

Message

What I'm looking for

I’m looking for a team building frontier LLM systems where evaluation, reliability, and security are first-class—using deterministic rubrics, adversarial testing, and formal verification to ship trustworthy agents and code.

I’m a Senior AI Training & Evaluation Specialist and AI Systems Engineer with 7+ years of experience designing adversarial test cases, constructing deterministic evaluation rubrics, and performing surgical code audits for frontier LLM training pipelines.

I deeply specialize in breaking AI reasoning pathways through systematic edge-case construction, repo-wide code evaluation, and multi-step agentic workflow validation—backed by rigorous rubric architecture and measurable agreement (94% peer reviewer alignment on complex RLHF tasks).

Across multiple contracts, I’ve built deterministic evaluation environments (including Docker + Python-based verifiers), created RLHF/DPO/GRPO-oriented datasets, and validated proofs using formal methods like Lean and TLA+—with a strong focus on correctness, security, and reliability under real-world constraints.

I’m equally comfortable hardening systems: detecting prompt injection and data poisoning, auditing tool-calling and API trajectories, and enforcing strict typing and JSON schema compliance (Pydantic) so agents can’t take unsafe actions without the right checks. I thrive where engineering meets formal evaluation, and where quality is proven—not assumed.

Experience

Work history, roles, and key accomplishments

Current

AI Training & Evaluation Specialist

Current

Datacurve AI Shipd

Dec 2025 - Present (7 months)

Architected deterministic evaluation environments and built high-fidelity RLHF datasets for frontier LLMs across software engineering, finance, mathematics, and Web3. Evaluated 200+ outputs monthly and maintained 94% peer-review agreement while designing adversarial “Quests,” strengthening code QA, and optimizing algorithmic complexity.

Evaluation Rubric Design Python Code Quality Assurance Concurrency Bug Debugging

Lead Architect & Developer

Aegis-Market

Feb 2023 - Mar 2026 (3 years 1 month)

Architected an autonomous multi-agent arbitrage and negotiation engine orchestrated via FastAPI with strict deterministic validation gates. Built Pydantic-enforced JSON schema pipelines and a cost-controlled RAG foundation, and implemented Docker isolation to prevent unverified agents from making autonomous state changes without human approval.

fastAPI Multi Agent Systems Pydantic

RLHF Engineer & Evaluator

Stealth Labs

Feb 2025 - Aug 2025 (6 months)

Built deterministic Docker-based evaluation environments and Python AST verifiers to assess frontier AI model reliability. Validated 200+ Lean-based mathematical proofs, generated edge-case training datasets, and improved evaluation fidelity by catching failures between “syntactic proofs” and solutions to the stated engineering problem.

Docker Python Lean Mathematical Proof Validation Edge Case Dataset Engineering Adversarial Prompt Defense

Senior Full Stack Software Engineer

Moovx

Nov 2023 - Jan 2025 (1 year 2 months)

Designed scalable web applications with React/Vue frontends and secure object-oriented backend APIs handling 100K+ daily requests. Led strangler-fig cloud migrations and implemented Terraform/Kubernetes CI/CD with blue-green rollouts for zero-downtime deployments, plus stream processing/data orchestration for 10M+ volatile payloads daily.

React Vue.Js Terraform Kubernetes CI CD Pipelines Blue Green Deployment Stream Processing PostgreSQL Go

Data Infrastructure & Systems Analyst

Freelance Tech Solutions

Mar 2017 - Dec 2020 (3 years 9 months)

Built and maintained distributed systems and backend workflows for enterprise data integrations using Python, with end-to-end feature development from PostgreSQL schema design through UI and state management. Integrated PyTest/unit testing and Git into the delivery lifecycle, reducing post-release hotfixes by 67%, and created automated data quality verifiers to reduce silent production failures.

Python Distributed Systems PostgreSQL Database Schema Design Workflow Automation Pytest Unit Testing Git Data Quality

LLM Function-Calling Evaluator

Turing

Simulated and evaluated multi-turn tool trajectories for LLM function calling, focusing on API interactions and enterprise reliability. Audited agent behavior for system-message compliance, privacy constraints, and zero-hallucination during high-risk tool execution, while debugging JSON schema and tool-calling logic issues.

Tool Use Trace Evaluation LLM Function Calling Multi Turn Agent Evaluation System Prompt Adherence Privacy And Compliance

Security Researcher

Leading Edge Technologies

Researched and built input sanitization defenses for LLM infrastructure against prompt injection, data poisoning, and logic extraction. Designed multi-stage adversarial filters achieving a 73% reduction in false positives while maintaining 99.2% attack detection.

LLM Security Prompt Injection Adversarial Testing Threat Modeling Detection Engineering

Education

Degrees, certifications, and relevant coursework

University of California, Berkeley

Bachelor of Science in Computer Science, Computer Science

Activities and societies: Competitive programming (LeetCode 200+ problems; active Codeforces participant focused on graph algorithms, DP, and optimization). Active contributor to MLOps Community (Slack) and Hugging Face architecture forums; maintains the Logic-Trace-Py open-source repository.

Earned a Bachelor of Science in Computer Science from the University of California, Berkeley. Coursework included advanced data structures & algorithms, distributed systems, network security, database systems, and cryptography.