Mihai Dragomir
@mihaidragomir1
Senior AI/ML Engineer building production LLM, RAG, and agentic systems with low-latency, reliable, cost-efficient scale.
What I'm looking for
I’m a Senior AI Engineer specializing in LLM-powered systems, retrieval-augmented generation (RAG), and agentic AI architectures. I build and scale production-grade AI platforms end-to-end—from data pipelines and model training to retrieval, orchestration, evaluation, and observability—across AWS and GCP.
At LexisNexis, I led development of a production-grade Legal Summarization Engine on Lexis+ AI, processing long-form case law (100+ pages) into structured headnotes and holdings, reducing attorney review time by 60%+. I also fine-tuned LLaMA models using QLoRA and built hierarchical summarization pipelines for documents exceeding 200K tokens, improving reliability and factual grounding.
I focus heavily on measurement and trust: I built RAG pipelines using Pinecone + Elasticsearch (hybrid search), reduced hallucinated legal claims by ~50%, and delivered a citation validation layer achieving >92% citation accuracy. I established automated evaluation pipelines to track faithfulness, ROUGE-L, and reasoning consistency across model releases, while deploying high-throughput inference with vLLM and TensorRT-LLM for sub-second latency.
Before that, I delivered enterprise conversational AI at Parloa—designing real-time ASR → NLU → Dialogue Manager → TTS pipelines, improving intent recognition by 30%+ and reducing average handling time by 20–25%. At Revolut, I built fraud and transaction anomaly detection models that reduced fraudulent transactions by 30%+ and implemented real-time streaming analytics with sub-second decision latency, alongside experimentation platforms for A/B testing.
Experience
Work history, roles, and key accomplishments
Led development of a Lexis+ Legal Summarization Engine, converting 100+ page case law into structured headnotes/holdings and reducing attorney review time by 60%+. Fine-tuned LLaMA models with QLoRA and built hybrid RAG (Pinecone + Elasticsearch) and multi-agent LangGraph workflows, cutting hallucinated legal claims by ~50% and reducing human post-editing by ~40%.
Improved intent recognition accuracy by 30%+ and scaled enterprise conversational AI across voice and chat channels for high-volume contact center operations. Built ASR→NLU→dialogue manager→TTS real-time pipelines and deployed FastAPI async services, reducing average handling time by 20–25%, lowering escalations by ~35%, and achieving sub-second response times.
Built and deployed ML models for fraud detection and transaction anomaly detection, reducing fraudulent transactions by 30%+ across card payment flows. Engineered features on large-scale financial datasets and delivered Kafka + Spark Streaming real-time risk scoring with sub-second latency, improving predictive performance by 20%+ and fraud recall by 25%+.
Software Engineer
Fortech
Oct 2010 - May 2016 (5 years 7 months)
Developed and maintained enterprise backend systems and web applications, improving reliability and reducing production incidents by ~30%. Built Spring/Java REST microservices and optimized relational database schemas and SQL queries, reducing query latency by 40%+ and integrating third-party payment/auth systems.
Education
Degrees, certifications, and relevant coursework
Stanford University
Master of Science in Computer Science, Computer Science
2016 - 2018
M.S. in Computer Science at Stanford University, completed from 2016 to 2018.
Technical University of Cluj-Napoca
Bachelor of Science in Computer Science, Computer Science
2006 - 2010
B.S. in Computer Science at the Technical University of Cluj-Napoca, completed from 2006 to 2010.
Tech stack
Software and tools used professionally
Apache Spark
Dialogflow
Kubernetes
Jenkins
NumPy
Pandas
MySQL
PostgreSQL
Gmail
Spring Framework
Spring MVC
Databricks
Terraform
AngularJS
JavaScript
HTML5
Java
TensorFlow
PyTorch
MLflow
scikit-learn
Kubeflow
DeepSpeed
Kafka
RabbitMQ
FastAPI
Grafana
Prometheus
Linux
Gemini
Elasticsearch
Milvus
Airflow
GuardRails
CUDA
SQL
XGBoost
Hugging Face
LightGBM
LangChain
LlamaIndex
Weaviate
ChromaDB
Weights & Biases
Evidently AI
Pinecone
Ray
Delta Lake
vLLM
JAX
Ragas
Agentic
Faiss
LangGraph
LangSmith
Rasa
DeepSeek
Model Context Protocol (MCP)
PEFT
Qwen
Dynamic
Safe
Jan
Sentence Transformers
Availability
Location
Authorized to work in
Social media
Job categories
Skills
Interested in hiring Mihai?
You can contact Mihai and 90k+ other talented remote workers on Himalayas.
Message MihaiFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
