Open to opportunities

Wen Yuan

@wenyuan1

Message

Senior AI and backend engineer building reliable LLM/RAG systems with rigorous evaluation and scalable data pipelines.

United States

Message

What I'm looking for

I’m looking to build and evaluate LLM/RAG products end-to-end—pairing scalable backend/data engineering with rigorous hallucination and faithfulness testing. I want an environment that values reliability, observability, and continuous model-quality improvement.

I’m a Senior AI Engineer focused on turning large language models into intelligent, reliable products. I apply LLM/RAG techniques alongside AI evaluation and safety-minded engineering to deliver consistent, production-ready outcomes.

In recent roles, I’ve built and operationalized evaluation methods for hallucination detection, reasoning quality, edge-case handling, and system-design consistency. At Meta, I designed multi-step agentic prompt workflows for debugging, refactoring, repository reasoning, and code synthesis—an approach aimed at improving correctness and engineering reliability at scale.

Previously, at NetEase, I advanced an enterprise RAG and semantic analytics platform using FastAPI, Airflow, OpenSearch, LlamaIndex, and SQL. I improved retrieval precision@5 by 35%+, reduced manual refresh effort by 70%+, kept common top-k retrieval under 500ms, and raised grounded-answer pass rate from ~70% to 85%+ through dedicated faithfulness and source-grounding evaluations.

Earlier in my career, I built AI-enabled full-stack systems and strong data/automation foundations—delivering asynchronous ingestion and observability that cut debugging time by 50%+ in low-quality scenarios. I’m at my best when combining backend engineering, retrieval/data design, and meticulous evaluation to ship systems users can trust.

Experience

Work history, roles, and key accomplishments

Current

AI Software Evaluation Engineer

Current

AI Evaluation & LLM Quality

Independent

Jan 2023 - Jan 2026 (3 years)

Performed LLM response quality and coding-agent reliability evaluation using prompt engineering, RLHF-style feedback, and rubric-based scoring. Identified failure modes such as hallucinated APIs, weak context usage, unsafe refactors, and flawed reasoning chains, and supported model alignment through structured evaluation datasets.

LLM Evaluation Prompt Engineering RLHF Style Feedback Hallucination Detection AI Safety Conversational AI

Senior AI Engineer

NetEase

Jan 2025 - Dec 2025 (11 months)

Built an enterprise RAG and semantic analytics platform with FastAPI, Airflow, Docling, OpenSearch, and LlamaIndex, processing 50K+ document chunks and records. Improved hybrid retrieval precision@5 by 35%+ versus keyword-only baselines and reduced manual refresh effort by 70%+ while keeping common top-k retrieval under 500ms.

fastAPI Airflow OpenSearch LlamaIndex Docling RAG (BM25 + Vector Search)Embeddings LLM Evaluation

AI Full-Stack Engineer

ITLabPro

Jan 2024 - Oct 2024 (9 months)

Delivered an AI-enabled full-stack platform using TypeScript, Node.js, Express.js, MongoDB, JWT, Stripe, and PayPal, supporting 1K+ user/project records. Integrated LLM-powered semantic search and assistance features, reducing manual content lookup and repetitive support workflows by 40%+, and improved AI task completion reliability to 95%+.

TypeScript Node MongoDB REST APIs JWT Stripe PayPal OpenAI API

Data & Automation Engineer

Creamistry

Feb 2019 - Jul 2022 (3 years 5 months)

Owned analytics automation for Creamistry China franchise operations by consolidating POS, delivery/order, inventory, and sales data into repeatable reporting workflows. Built Python/SQL ETL pipelines that reduced manual Excel-based reporting by 60–70% and created dashboards that reduced ad hoc reporting requests by 40%+.

Python SQL ETL Data Validation Tableau Power BI Forecasting Retail Analytics Dashboarding

Software Developer

Huawei

Jul 2015 - Dec 2018 (3 years 5 months)

Contributed to enterprise software systems by developing Java/Kotlin modules, backend integrations, REST API workflows, and authentication logic. Improved reliability and reduced repeated release-blocking issues by 25–30% by implementing request validation, service orchestration, edge-case handling, and cross-service debugging.

Java Kotlin Android REST APIs SQL Authentication CI CD Linux Refactoring

Web Developer

ZBJ Network

Feb 2013 - May 2015 (2 years 3 months)

Built backend-focused web modules for a large online services marketplace, including listings, user profiles, order/workflow tracking, admin tooling, and SQL-backed reporting views. Improved marketplace reliability and performance by resolving production issues and optimizing backend routes/queries to improve response times for common workflows by 20%+.

PHP Laravel JavaScript SQL REST APIs Backend Development Query Optimization Views

Education

Degrees, certifications, and relevant coursework

City University of Seattle

Master of Science, Computer Science

2023 - 2026

Master of Science in Computer Science (2023–2026), partnering with clients on AI engineering work involving LLM evaluation, retrieval-augmented generation (RAG), semantic search, and reliability testing.