Looking for a job

hellen waruru

@hellenwaruru

Message

AI Data Trainer with 6 years evaluating LLMs and shaping RLHF responses with high-quality scoring.

United States

Message

What I'm looking for

I’m looking for a role applying my AI model evaluation, RLHF reinforcement learning, and NLP expertise to help build safe, accurate LLMs using rigorous rubric-based scoring and continuous quality improvement.

I’m an Experienced AI Data Trainer with 6 years in AI model evaluation and reinforcement learning, focused on improving performance, accuracy, and efficiency across real-world AI systems. I craft prompt-response pairs for reinforcement learning from human feedback pipelines and support fine-tuning and alignment of large language models.

In my roles, I generate, review, and evaluate AI model responses as a subject matter expert across STEM, creative writing, logic, and coding. I assess outputs for factual accuracy, coherence, safety, and instruction adherence using comprehensive scoring rubrics, driving consistent quality at scale.

At Outlier AI (via Scale AI), I maintained a “97%+ average quality score across over 2,000 tasks with no critical violations,” collaborating with project leads to identify and address systematic model errors. I also developed structured datasets for mathematics and reasoning to enhance AI tutoring systems, including detailed solution explanations and error pattern annotations.

With a PhD in Computer Science, plus training in NLP, machine learning, and prompt engineering for large language models, I bring a research-minded approach to evaluation and quality assurance. I’m motivated by building safer, more reliable AI through rigorous benchmarking, fact-checking, and continuous improvement.

Experience

Work history, roles, and key accomplishments

Current

Senior AI Data Trainer

Current

Outlier Ai

Jan 2022 - Present (4 years 6 months)

Generated, reviewed, and evaluated AI model responses across STEM, creative writing, logic, and coding, creating prompt-response pairs to support reinforcement learning from human feedback workflows. Maintained a 97%+ average quality score across 2,000+ tasks with no critical violations by using rubric-based scoring for factual accuracy, coherence, safety, and instruction adherence.

LLM Evaluation Instruction Following Coherence Assessment Model Alignment

Robotic and LLM Eval

Open Train Ai

Jun 2022 - Sep 2025 (3 years 3 months)

Conducted comprehensive evaluations of robotics systems and large language models (LLMs) to assess performance, accuracy, and efficiency, supporting development of cutting-edge AI technologies.

LLM Evaluation AI Benchmarking Performance Testing Accuracy Analysis AI Quality Assurance Remote Collaboration

AI Content Trainer (Math)

Mindrift / Toloka

Mar 2021 - Dec 2021 (9 months)

Developed structured mathematics and logical reasoning datasets with detailed solution explanations and error-pattern annotations to improve AI tutoring output accuracy. Rated 500+ peer-submitted samples monthly while maintaining a rejection rate below 2%.

Mathematics Dataset Creation Solution Explanation Labeling Error Patterns AI Tutoring Support Data Annotation

Freelance Data Annotator

Appen / DataAnnotation.tech

Jun 2020 - Feb 2021 (8 months)

Annotated text, image, and audio data for NLP and computer vision model training, including named entity recognition, sentiment analysis, and intent classification. Completed 300+ tasks weekly while resolving annotation ambiguities and maintaining strict confidentiality and data-handling standards.

Named Entity Recognition Sentiment Analysis Text Annotation Image Annotation Audio Annotation NLP Training Data Confidentiality