Skip to main content
Valter LopesVL
Open to opportunities

Valter Lopes

@valterlopes

AI evaluator specializing in LLM reasoning quality, long-context coherence, and hallucination detection.

Portugal
Message

What I'm looking for

I’m looking for a remote role where I can evaluate LLM reasoning quality end-to-end—long-context coherence, hallucination detection, semantic consistency, and prompt stress testing—so teams can improve reliability, alignment, and instruction-following.

I’m an analytical, concept-oriented AI evaluator focused on evaluating LLM response reasoning quality—especially across long-context interaction. I work deeply on structured reasoning exploration, recursive prompt testing, and semantic consistency analysis (PT/EN).

In my independent, remote research (2024–Present), I conduct extensive long-context interaction and evaluation to assess reasoning behavior, semantic continuity, and abstraction handling. I produce iterative evaluation frameworks and comparative analyses, including detect contradictions and hallucinations, evaluate instruction-following precision, and test edge-case prompts.

I’m particularly strong at identifying subtle logical inconsistencies, ambiguity drift, narrativе instability, and alignment weaknesses across extended AI conversations. I focus on ranking quality, verifying symbolic and conceptual consistency, and stress-testing long-context coherence under challenging conceptual loads.

Experience

Work history, roles, and key accomplishments

IN
Current

AI Evaluation & LLM Reasoning

Independent

Jan 2024 - Present (2 years 5 months)

Conducted structured AI reasoning and long-context evaluation focused on semantic continuity, hallucination detection, and recursive prompt stress-testing. Produced iterative evaluation frameworks and comparative analyses of LLM responses under abstraction-heavy conceptual loads.

Education

Degrees, certifications, and relevant coursework

VR

Valter Lopes Caldas da Rainha

AI Evaluation & LLM Reasoning Analysis

2024 -

Activities and societies: Long-context conversational evaluation; recursive prompt testing; semantic drift/hallucination detection; instruction-following checks; comparative model evaluation; PT/EN interaction; ChatGPT/LLM workflow testing.

Independent remote AI evaluation and LLM reasoning analysis focused on long-context coherence, semantic consistency, hallucination detection, and recursive prompt stress testing. Produces comparative response rankings and evaluation frameworks to assess reasoning quality and alignment weaknesses across extended PT/EN conversations.

Tech stack

Software and tools used professionally

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan