Khaal User
@khaaluser
Senior AI systems engineer specializing in adversarial LLM evaluation, deterministic rubrics, and code audits.
What I'm looking for
I’m a Senior AI Training & Evaluation Specialist and AI Systems Engineer with 7+ years of experience designing adversarial test cases, constructing deterministic evaluation rubrics, and performing surgical code audits for frontier LLM training pipelines.
I deeply specialize in breaking AI reasoning pathways through systematic edge-case construction, repo-wide code evaluation, and multi-step agentic workflow validation—backed by rigorous rubric architecture and measurable agreement (94% peer reviewer alignment on complex RLHF tasks).
Across multiple contracts, I’ve built deterministic evaluation environments (including Docker + Python-based verifiers), created RLHF/DPO/GRPO-oriented datasets, and validated proofs using formal methods like Lean and TLA+—with a strong focus on correctness, security, and reliability under real-world constraints.
I’m equally comfortable hardening systems: detecting prompt injection and data poisoning, auditing tool-calling and API trajectories, and enforcing strict typing and JSON schema compliance (Pydantic) so agents can’t take unsafe actions without the right checks. I thrive where engineering meets formal evaluation, and where quality is proven—not assumed.
Experience
Work history, roles, and key accomplishments
AI Training & Evaluation Specialist
Datacurve AI Shipd
Dec 2025 - Present (6 months)
Architected deterministic evaluation environments and built high-fidelity RLHF datasets for frontier LLMs across software engineering, finance, mathematics, and Web3. Evaluated 200+ outputs monthly and maintained 94% peer-review agreement while designing adversarial “Quests,” strengthening code QA, and optimizing algorithmic complexity.
Lead Architect & Developer
Aegis-Market
Feb 2023 - Mar 2026 (3 years 1 month)
Architected an autonomous multi-agent arbitrage and negotiation engine orchestrated via FastAPI with strict deterministic validation gates. Built Pydantic-enforced JSON schema pipelines and a cost-controlled RAG foundation, and implemented Docker isolation to prevent unverified agents from making autonomous state changes without human approval.
RLHF Engineer & Evaluator
Stealth Labs
Feb 2025 - Aug 2025 (6 months)
Built deterministic Docker-based evaluation environments and Python AST verifiers to assess frontier AI model reliability. Validated 200+ Lean-based mathematical proofs, generated edge-case training datasets, and improved evaluation fidelity by catching failures between “syntactic proofs” and solutions to the stated engineering problem.
Senior Full Stack Software Engineer
Moovx
Nov 2023 - Jan 2025 (1 year 2 months)
Designed scalable web applications with React/Vue frontends and secure object-oriented backend APIs handling 100K+ daily requests. Led strangler-fig cloud migrations and implemented Terraform/Kubernetes CI/CD with blue-green rollouts for zero-downtime deployments, plus stream processing/data orchestration for 10M+ volatile payloads daily.
Data Infrastructure & Systems Analyst
Freelance Tech Solutions
Mar 2017 - Dec 2020 (3 years 9 months)
Built and maintained distributed systems and backend workflows for enterprise data integrations using Python, with end-to-end feature development from PostgreSQL schema design through UI and state management. Integrated PyTest/unit testing and Git into the delivery lifecycle, reducing post-release hotfixes by 67%, and created automated data quality verifiers to reduce silent production failures.
Security Researcher
Leading Edge Technologies
Researched and built input sanitization defenses for LLM infrastructure against prompt injection, data poisoning, and logic extraction. Designed multi-stage adversarial filters achieving a 73% reduction in false positives while maintaining 99.2% attack detection.
LLM Function-Calling Evaluator
Turing
Simulated and evaluated multi-turn tool trajectories for LLM function calling, focusing on API interactions and enterprise reliability. Audited agent behavior for system-message compliance, privacy constraints, and zero-hallucination during high-risk tool execution, while debugging JSON schema and tool-calling logic issues.
Education
Degrees, certifications, and relevant coursework
University of California, Berkeley
Bachelor of Science in Computer Science, Computer Science
Activities and societies: Competitive programming (LeetCode 200+ problems; active Codeforces participant focused on graph algorithms, DP, and optimization). Active contributor to MLOps Community (Slack) and Hugging Face architecture forums; maintains the Logic-Trace-Py open-source repository.
Earned a Bachelor of Science in Computer Science from the University of California, Berkeley. Coursework included advanced data structures & algorithms, distributed systems, network security, database systems, and cryptography.
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Khaal?
You can contact Khaal and 90k+ other talented remote workers on Himalayas.
Message KhaalFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
