JW

Open to opportunities

John Wangechi

@johnwangechi

I improve large language models through RLHF evaluation, prompt engineering, and AI safety review.

What I'm looking for

I’m looking to evaluate and improve LLMs end to end—running RLHF-style preference ranking, prompt tests, and safety reviews—so teams ship reliable, accurate, user-aligned AI.

I’m an AI Evaluation Specialist and Prompt Engineering professional with 5+ years of experience improving large language model performance through RLHF, model evaluation, data annotation, prompt optimization, content quality assessment, and AI safety review. I evaluate AI-generated outputs across conversational, educational, technical, and creative domains—ensuring factual accuracy, instruction adherence, and alignment with user intent. I take pride in translating complex evaluation criteria into clear, actionable feedback that raises model quality, reliability, and safety.

In my current role, I evaluate thousands of AI-generated responses against structured quality frameworks measuring accuracy, relevance, reasoning, safety, and instruction adherence. I strengthen evaluation consistency by applying RLHF methodologies to rank outputs and identify high-impact improvement opportunities, and I design prompt-testing scenarios to assess reasoning, creativity, factuality, and problem-solving performance across diverse use cases. I also identify hallucinations, logical inconsistencies, bias risks, and safety concerns, and I run red-team testing exercises to probe robustness against edge cases and adversarial prompts—backed by detailed reports and recommendations that inform model refinement.

Previously, I supported training and data quality through systematic AI response review, quality assurance across large annotation datasets, and assessments of linguistic, cultural, and contextual appropriateness for global usability. I contributed structured feedback and recommendations that improved training-data reliability and model performance while collaborating with distributed remote teams to meet milestones without sacrificing quality. Across evaluation, annotation, and safety work, my professional ethos is consistent: verify the right signals, surface risks early, and help teams ship AI that users can trust.

Experience

Work history, roles, and key accomplishments

MI

Current

AI Evaluation Specialist

Current

Micro1

Jan 2026 - Present (6 months)

Evaluated thousands of AI-generated responses against quality frameworks covering accuracy, relevance, reasoning, safety, and instruction adherence. Improved evaluation consistency using RLHF-style preference ranking and delivered actionable reports from hallucination detection, bias risk reviews, and red-team testing.

AI Evaluation Preference Ranking Prompt Engineering Hallucination Detection Content Quality Assurance Red Teaming

HA

AI Trainer

Handshake

Jan 2022 - Jan 2023 (1 year)

Improved AI response quality by reviewing outputs for clarity, accuracy, policy compliance, and user satisfaction. Strengthened training-data reliability through large-scale annotation QA, cultural/context appropriateness checks, and structured feedback for global usability.

Quality Assurance Data Annotation QA Policy Compliance Bias And Cultural Appropriateness Remote Collaboration Structured Feedback

RE

AI Dataannotation Specialist

Remotask

Jan 2019 - Jan 2022 (3 years)

Performed large-scale annotation and classification for text, image, audio, video, and LiDAR datasets to support machine learning initiatives. Conducted dataset validation and quality-control reviews and evaluated conversational AI outputs for grammar, coherence, relevance, and tone while producing structured labels and metadata.

Data Annotation Data Validation Quality Control Audio Labeling Conversational AI Evaluation Data Labeling

Education

Degrees, certifications, and relevant coursework

UN

University of Nairobi

Bachelor of Science, Computer Science

Bachelor of Science in Computer Science from the University of Nairobi, completed in 2024.

Tech stack

Software and tools used professionally

Gmail

Google Workspace

Gemini

Prompts.ai

Remote

Get matched with your dream remote job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!