Open to opportunities

Larry Honrada

@larryhonrada

Message

Senior AI Engineer specializing in production-grade LLM/RAG systems and low-latency distributed inference.

Philippines

Message

What I'm looking for

I’m looking for a team building production LLM/RAG systems—especially retrieval, hybrid search, and low-latency inference—where I can own end-to-end pipelines, ship scalable APIs, and strengthen reliability with MLOps, monitoring, and CI/CD.

I’m a Senior AI Engineer with 10+ years building production-grade machine learning systems and distributed data platforms. I focus on LLM-powered applications, retrieval systems, and low-latency inference, with a proven track record of delivering systems that process millions of documents and operate reliably under real-world workloads.

At Luxoft, I led development of a production-grade RAG platform enabling natural language querying across millions of insurance documents, reducing analysis time from hours to seconds. I designed end-to-end pipelines (ingestion, chunking, embeddings, vector indexing, retrieval, and LLM inference), optimized vector search/retrieval to achieve ~38ms query latency, built hybrid retrieval and reranking for better relevance, and shipped backend REST APIs (FastAPI, microservices) on AWS/Azure with Docker and Kubernetes. I also owned MLOps workflows with CI/CD and MLflow, integrated observability (Prometheus, Grafana, logging/alerting), and led a team of 4–6 engineers.

Experience

Work history, roles, and key accomplishments

Current

Senior AI Engineer

Current

Luxoft

Dec 2022 - Present (3 years 7 months)

Led development of a production-grade RAG platform for natural language querying across millions of insurance documents, reducing document analysis time from hours to seconds. Built end-to-end ingestion-to-inference pipelines and REST services, achieving ~38ms query latency with hybrid retrieval and reranking.

Vector Search Reranking RAG

ML Platform Engineer

Dataminr

Aug 2017 - Nov 2022 (5 years 3 months)

Built distributed ML infrastructure for real-time event detection across global data streams, supporting billions of daily inputs. Improved throughput by ~30%, reduced model deployment time from weeks to under one week, and cut incident detection time by ~50% via monitoring and alerting.

ML Infrastructure CI CD Automation Model Serving Monitoring And Alerting

AI Engineer

Accenture

Sep 2013 - Jul 2017 (3 years 10 months)

Built telecom customer churn prediction models (logistic regression, random forest, gradient boosting) and generated risk scores for millions of users to support retention strategies. Improved model performance using feature engineering and evaluation (ROC-AUC, precision/recall) and deployed batch scoring pipelines.

Churn Modeling Logistic Regression Random Forest Gradient Boosting Feature Engineering Precision Recall Batch Scoring Supervised Learning

Education

Degrees, certifications, and relevant coursework

University of the Philippines

Bachelor’s Degree, Computer Science

2008 - 2013

Activities and societies: Built distributed ML systems for real-time event detection; designed multi-model inference for real-time decision pipelines; created AI image generation pipelines; developed an LLM-powered voice AI agent for inquiries and scheduling.

Earned a Bachelor’s Degree in Computer Science. Completed related projects including distributed event detection, multi-model inference, AI image generation, and an LLM-powered voice AI agent.