Open to opportunities

Sammy Naqi

@sammynaqi

Message

Senior AI Engineer building production LLM, RAG, and agentic systems with low-latency, scalable LLMOps.

United States

Message

What I'm looking for

I’m looking to build and scale production LLM, RAG, and agentic AI with strong LLMOps/MLOps, low-latency performance, and rigorous governance (HIPAA/SOC 2). I want to ship measurable impact while mentoring teams.

I’m a Senior AI Engineer with 8+ years of experience building and deploying production-grade machine learning and Generative AI systems across healthcare, fintech, and enterprise platforms. I specialize in Large Language Models (LLMs), Retrieval Augmented Generation (RAG), and agentic AI systems, with strong expertise in LLMOps, distributed systems, and low-latency inference.

I’ve architected end-to-end LLM pipelines that convert clinician-patient conversations into structured medical documentation, improving provider efficiency by 40%. I’ve also deployed advanced RAG systems using hybrid retrieval, embeddings (FAISS), and reranking to enhance accuracy and context by 30%, and built real-time vector search with Pinecone to improve access to grounded medical knowledge.

Across roles, I’ve scaled AI systems to millions of users while optimizing performance and cost using cloud-native architectures. I’m deeply focused on AI governance and responsible AI—implementing compliance aligned with HIPAA, SOC 2, GDPR, and data privacy best practices, and mentoring teams to strengthen MLOps with MLflow, Kubeflow, and CI/CD.

Experience

Work history, roles, and key accomplishments

Current

Senior AI Engineer

Current

Abridge

Jan 2024 - Present (2 years 6 months)

Architected and deployed end-to-end LLM pipelines to convert clinician-patient conversations into structured medical documentation, improving provider efficiency by 40%. Designed and deployed RAG and low-latency inference systems that increased accuracy by 30% and optimized operational costs by 30%, while implementing HIPAA/SOC 2-aligned AI governance.

Python LLM Retrieval Augmented Generation (RAG)Pinecone Faiss Kubernetes VLLM Docker Kubeflow

Machine Learning Engineer

Zest AI

Jun 2020 - Dec 2023 (3 years 6 months)

Developed and deployed credit risk models using XGBoost, LightGBM, and deep learning (LSTM/CNN), improving model accuracy by 20%. Built Spark/Airflow data pipelines for 50M+ records and implemented CI/CD and low-latency inference on AWS, reducing deployment times by 40% while using SHAP/LIME for explainability.

XGBoost LightGBM Pyspark Apache Spark Airflow AWS Sagemaker CI CD TensorFlow Serving TorchServe MLFlow

Associate AI/ML Engineer

Netomi

Jul 2017 - May 2020 (2 years 10 months)

Built conversational AI for customer support automation using transformer-based NLP models (e.g., BERT) and sequence-to-sequence dialogue generation. Developed scalable semantic search and ML pipelines using TensorFlow/Keras and AWS to support intent classification, information retrieval, and dialogue management.

BERT NLP Semantic Search Word2Vec TF IDF TensorFlow Keras AWS