Allan Kipkemboi
@allankipkemboi
Data scientist and AI specialist improving model performance through data pipelines and RLHF systems.
What I'm looking for
I’m a data scientist and AI specialist with over five years of hands-on experience across the full AI data lifecycle. I focus on boosting model performance by building high-quality data pipelines, designing human-in-the-loop systems, and tuning RLHF setups.
At Mercor, I led the production deployment of an RLHF data pipeline that integrates 10,000+ human-in-the-loop signals weekly into the training loop, reducing user-reported factual errors by 15%. I also standardized LLM evaluation and ranking, improving human-alignment scores while strengthening dataset quality by annotating and validating multimodal data (text, image, audio), cutting annotation errors by 20%.
As a Data Developer at RWS Group, I architected and implemented a feature engineering layer in Python/Pandas that increased dialect classification accuracy by 9% across 10+ dialects. I’ve improved computer vision precision and recall with pixel-level semantic segmentation and bounding box annotation, and strengthened evaluation using pairwise comparison testing with metrics like accuracy and F1-score.
Earlier, as a Data Analyst at CloudFactory, I delivered data-driven insights using Python and SQL—improving reporting accuracy and turnaround time through validation workflows and data cleaning. In my annotation roles with Remotasks and Appen, I scaled training datasets for speech, sentiment/intent, object detection, and 3D point cloud labeling, consistently improving accuracy and labeling consistency.
Experience
Work history, roles, and key accomplishments
Generalist Data Annotation Expert
Mercor
Nov 2025 - Present (5 months)
Led deployment of an RLHF data pipeline integrating 10,000+ weekly human-in-the-loop signals, reducing user-reported factual errors by 15%. Standardized LLM evaluation and ranking and improved multimodal dataset quality while reducing annotation errors by 20%.
Implemented a Python/Pandas feature engineering layer that increased multilingual dialect classification accuracy by 9% across 10+ dialects. Improved computer vision precision/recall using semantic segmentation and bounding box annotation, and validated outputs with pairwise comparisons and accuracy/F1 metrics.
Delivered data-driven insights by analyzing structured datasets with Python (Pandas/NumPy) and SQL, improving reporting accuracy and turnaround time. Increased data integrity through validation workflows, cleaning, and preprocessing, and supported KPI tracking by streamlining client data queries.
Scaled supervised learning training datasets by annotating thousands of image, video, and LiDAR data points. Improved speech recognition quality with multilingual audio transcription/segmentation and enhanced object detection performance using bounding boxes and 3D point cloud labeling.
Improved NLP training for sentiment analysis and intent classification through high-volume dataset annotation. Increased inter-annotator agreement using standardized labeling methodologies and supported large-scale AI data pipeline workflows across multiple platforms.
Education
Degrees, certifications, and relevant coursework
The Technical University of Kenya
Bachelor of Technology, Chemical Engineering
2022 -
Pursuing a Bachelor of Technology in Chemical Engineering at The Technical University of Kenya, Nairobi, with anticipated completion in June 2026.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Allan?
You can contact Allan and 90k+ other talented remote workers on Himalayas.
Message AllanFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
