Skip to main content
JW
Open to opportunities

John Wangechi

@johnwangechi

I improve large language models through RLHF evaluation, prompt engineering, and AI safety review.

Kenya
Message

What I'm looking for

I’m looking to evaluate and improve LLMs end to end—running RLHF-style preference ranking, prompt tests, and safety reviews—so teams ship reliable, accurate, user-aligned AI.

I’m an AI Evaluation Specialist and Prompt Engineering professional with 5+ years of experience improving large language model performance through RLHF, model evaluation, data annotation, prompt optimization, content quality assessment, and AI safety review. I evaluate AI-generated outputs across conversational, educational, technical, and creative domains—ensuring factual accuracy, instruction adherence, and alignment with user intent. I take pride in translating complex evaluation criteria into clear, actionable feedback that raises model quality, reliability, and safety.

In my current role, I evaluate thousands of AI-generated responses against structured quality frameworks measuring accuracy, relevance, reasoning, safety, and instruction adherence. I strengthen evaluation consistency by applying RLHF methodologies to rank outputs and identify high-impact improvement opportunities, and I design prompt-testing scenarios to assess reasoning, creativity, factuality, and problem-solving performance across diverse use cases. I also identify hallucinations, logical inconsistencies, bias risks, and safety concerns, and I run red-team testing exercises to probe robustness against edge cases and adversarial prompts—backed by detailed reports and recommendations that inform model refinement.

Previously, I supported training and data quality through systematic AI response review, quality assurance across large annotation datasets, and assessments of linguistic, cultural, and contextual appropriateness for global usability. I contributed structured feedback and recommendations that improved training-data reliability and model performance while collaborating with distributed remote teams to meet milestones without sacrificing quality. Across evaluation, annotation, and safety work, my professional ethos is consistent: verify the right signals, surface risks early, and help teams ship AI that users can trust.

Experience

Work history, roles, and key accomplishments

MI
Current

AI Evaluation Specialist

Micro1

Jan 2026 - Present (5 months)

Evaluated thousands of AI-generated responses against quality frameworks covering accuracy, relevance, reasoning, safety, and instruction adherence. Improved evaluation consistency using RLHF-style preference ranking and delivered actionable reports from hallucination detection, bias risk reviews, and red-team testing.

Education

Degrees, certifications, and relevant coursework

University of Nairobi logoUN

University of Nairobi

Bachelor of Science, Computer Science

Bachelor of Science in Computer Science from the University of Nairobi, completed in 2024.

Tech stack

Software and tools used professionally

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan