John Wangechi
@johnwangechi
I improve large language models through RLHF evaluation, prompt engineering, and AI safety review.
What I'm looking for
I’m an AI Evaluation Specialist and Prompt Engineering professional with 5+ years of experience improving large language model performance through RLHF, model evaluation, data annotation, prompt optimization, content quality assessment, and AI safety review. I evaluate AI-generated outputs across conversational, educational, technical, and creative domains—ensuring factual accuracy, instruction adherence, and alignment with user intent. I take pride in translating complex evaluation criteria into clear, actionable feedback that raises model quality, reliability, and safety.
In my current role, I evaluate thousands of AI-generated responses against structured quality frameworks measuring accuracy, relevance, reasoning, safety, and instruction adherence. I strengthen evaluation consistency by applying RLHF methodologies to rank outputs and identify high-impact improvement opportunities, and I design prompt-testing scenarios to assess reasoning, creativity, factuality, and problem-solving performance across diverse use cases. I also identify hallucinations, logical inconsistencies, bias risks, and safety concerns, and I run red-team testing exercises to probe robustness against edge cases and adversarial prompts—backed by detailed reports and recommendations that inform model refinement.
Previously, I supported training and data quality through systematic AI response review, quality assurance across large annotation datasets, and assessments of linguistic, cultural, and contextual appropriateness for global usability. I contributed structured feedback and recommendations that improved training-data reliability and model performance while collaborating with distributed remote teams to meet milestones without sacrificing quality. Across evaluation, annotation, and safety work, my professional ethos is consistent: verify the right signals, surface risks early, and help teams ship AI that users can trust.
Experience
Work history, roles, and key accomplishments
AI Evaluation Specialist
Micro1
Jan 2026 - Present (5 months)
Evaluated thousands of AI-generated responses against quality frameworks covering accuracy, relevance, reasoning, safety, and instruction adherence. Improved evaluation consistency using RLHF-style preference ranking and delivered actionable reports from hallucination detection, bias risk reviews, and red-team testing.
AI Trainer
Handshake
Jan 2022 - Jan 2023 (1 year)
Improved AI response quality by reviewing outputs for clarity, accuracy, policy compliance, and user satisfaction. Strengthened training-data reliability through large-scale annotation QA, cultural/context appropriateness checks, and structured feedback for global usability.
AI Dataannotation Specialist
Remotask
Jan 2019 - Jan 2022 (3 years)
Performed large-scale annotation and classification for text, image, audio, video, and LiDAR datasets to support machine learning initiatives. Conducted dataset validation and quality-control reviews and evaluated conversational AI outputs for grammar, coherence, relevance, and tone while producing structured labels and metadata.
Education
Degrees, certifications, and relevant coursework
University of Nairobi
Bachelor of Science, Computer Science
Bachelor of Science in Computer Science from the University of Nairobi, completed in 2024.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring John?
You can contact John and 90k+ other talented remote workers on Himalayas.
Message JohnFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
