Skip to main content
Larry HonradaLH
Open to opportunities

Larry Honrada

@larryhonrada

Senior AI Engineer specializing in production-grade LLM/RAG systems and low-latency distributed inference.

Philippines
Message

What I'm looking for

I’m looking for a team building production LLM/RAG systems—especially retrieval, hybrid search, and low-latency inference—where I can own end-to-end pipelines, ship scalable APIs, and strengthen reliability with MLOps, monitoring, and CI/CD.

I’m a Senior AI Engineer with 10+ years building production-grade machine learning systems and distributed data platforms. I focus on LLM-powered applications, retrieval systems, and low-latency inference, with a proven track record of delivering systems that process millions of documents and operate reliably under real-world workloads.

At Luxoft, I led development of a production-grade RAG platform enabling natural language querying across millions of insurance documents, reducing analysis time from hours to seconds. I designed end-to-end pipelines (ingestion, chunking, embeddings, vector indexing, retrieval, and LLM inference), optimized vector search/retrieval to achieve ~38ms query latency, built hybrid retrieval and reranking for better relevance, and shipped backend REST APIs (FastAPI, microservices) on AWS/Azure with Docker and Kubernetes. I also owned MLOps workflows with CI/CD and MLflow, integrated observability (Prometheus, Grafana, logging/alerting), and led a team of 4–6 engineers.

Experience

Work history, roles, and key accomplishments

Luxoft logoLU
Current

Senior AI Engineer

Dec 2022 - Present (3 years 6 months)

Led development of a production-grade RAG platform for natural language querying across millions of insurance documents, reducing document analysis time from hours to seconds. Built end-to-end ingestion-to-inference pipelines and REST services, achieving ~38ms query latency with hybrid retrieval and reranking.

Dataminr logoDA

ML Platform Engineer

Aug 2017 - Nov 2022 (5 years 3 months)

Built distributed ML infrastructure for real-time event detection across global data streams, supporting billions of daily inputs. Improved throughput by ~30%, reduced model deployment time from weeks to under one week, and cut incident detection time by ~50% via monitoring and alerting.

Accenture logoAC

AI Engineer

Sep 2013 - Jul 2017 (3 years 10 months)

Built telecom customer churn prediction models (logistic regression, random forest, gradient boosting) and generated risk scores for millions of users to support retention strategies. Improved model performance using feature engineering and evaluation (ROC-AUC, precision/recall) and deployed batch scoring pipelines.

Education

Degrees, certifications, and relevant coursework

University of the Philippines logoUP

University of the Philippines

Bachelor’s Degree, Computer Science

2008 - 2013

Activities and societies: Built distributed ML systems for real-time event detection; designed multi-model inference for real-time decision pipelines; created AI image generation pipelines; developed an LLM-powered voice AI agent for inquiries and scheduling.

Earned a Bachelor’s Degree in Computer Science. Completed related projects including distributed event detection, multi-model inference, AI image generation, and an LLM-powered voice AI agent.

Interested in hiring Larry?

You can contact Larry and 90k+ other talented remote workers on Himalayas.

Message Larry

People also viewed

View all talent

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan