Usman Farid
@usmanfarid
AI Evaluation Engineer and Python backend developer building deterministic AI benchmarking and scalable APIs for reliable model evaluation.
What I'm looking for
I’m an AI Evaluation Engineer and Python backend developer focused on building evaluation systems that are correct, repeatable, and fast. I specialize in designing deterministic benchmark tasks, strict validation logic, and end-to-end AI evaluation pipelines that teams can trust.
At Turing (remote), I’ve ranked among top contributors in my POD for evaluation accuracy, correctness, and turnaround time. I contributed to Terminal-Bench by designing deterministic benchmark tasks and evaluator logic for terminal and system-level reasoning and execution, and I served as a Peer Reviewer to ensure evaluator reliability and task determinism.
I also worked on OSWorld, a GUI-focused AI evaluation platform, where I developed structured JSON-based task specifications for GUI workflows, expected states, and validation criteria. I built and reviewed Python evaluators that automatically verify GUI interactions and task completion, and I joined a special research effort to analyze Average Handling Time (AHT) while optimizing evaluation complexity and workflow efficiency.
Earlier, as a Software Engineer (Python / Rust), I built scalable RESTful APIs with FastAPI and Flask, implemented secure authentication with role-based access control (RBAC), and delivered backend systems for payment processing, analytics dashboards, and real-time tracking. I also migrated performance-critical Python modules to Rust, improving execution speed by up to 40%, while partnering with frontend and DevOps teams to streamline deployments and reliability.
Experience
Work history, roles, and key accomplishments
Ranked among top contributors for evaluation accuracy, correctness, and turnaround time while working on Terminal-Bench. Built deterministic, JSON-specified evaluators for terminal and GUI/system-level LLM tasks, including peer review and rework for complex benchmark fixes.
Software Engineer (Py/Rust)
Single Solution
Jan 2022 - Jan 2025 (3 years)
Developed scalable backend REST APIs and secure authentication/RBAC systems for payments, analytics dashboards, and real-time tracking platforms. Migrated performance-critical Python modules to Rust, improving execution speed by up to 40%, and improved deployment reliability with cross-functional DevOps collaboration.
Education
Degrees, certifications, and relevant coursework
University of Engineering and Technology (UET), Lahore
Bachelor of Science, Software Engineering
2021 - 2025
Activities and societies: Minor: Artificial Intelligence.
B.S. in Computer Science with a major in Software Engineering and a minor in Artificial Intelligence at University of Engineering and Technology (UET), Lahore (2021–2025).
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Usman?
You can contact Usman and 90k+ other talented remote workers on Himalayas.
Message UsmanFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
