Open to opportunities

sujana acharya

@sujanaacharya

Message

AI/ML engineer building reliable, cost-efficient LLM/RAG and agentic systems with 4+ years in production and research.

Nepal

Message

What I'm looking for

I’m looking for an AI/ML role where I can ship production RAG and agentic systems, optimize model latency/cost, and collaborate on applied research—prioritizing reliability, monitoring, and measurable impact.

I’m an AI/ML Engineer with 4+ years spanning production model deployment, LLM integration, agentic systems, and applied research in computer vision and NLP. I build end-to-end LLM systems—RAG pipelines, multi-agent workflows, and model optimization—then deploy them with real monitoring and reliability controls.

In production, I integrated Anthropic Claude and OpenAI GPT-4o into FastAPI services, adding retry logic, exponential back-off, circuit breakers, and token-budget guardrails to reach 99.5% uptime on LLM-dependent endpoints. I designed RAG pipelines with FAISS and pgvector hybrid search, cutting hallucination rate by ~40%, and built LangChain/LangGraph agentic workflows for automated analysis and adaptive recommendations.

I also focus on performance and cost: I deployed locally via Ollama, optimized inference with vLLM/PagedAttention, and used LoRA/PEFT plus 4-bit GPTQ/bitsandbytes quantization to reduce latency and VRAM. My research background includes publishing in medical AI (MediVQA) and computer vision (Automated Retail Billing), and my goal is to build systems that are reliable, cost-efficient, and truly impactful.

Experience

Work history, roles, and key accomplishments

Current

AI/ML Engineer

Current

NsDevil

Apr 2023 - Present (3 years 3 months)

Integrated Anthropic Claude and OpenAI GPT-4o into production FastAPI services, implementing resilience controls to achieve 99.5% uptime on LLM endpoints. Built end-to-end RAG and agentic workflows (LangChain/LangGraph) and optimized inference with vLLM and 4-bit quantization, reducing hallucinations by ~40% and cutting inference latency from 2.3s to 0.8s while lowering monthly API spend by 35%.

RAG (FAISS Pgvector LLMs (LangChain LangGraph VLLM AWS EC2 S3 Lambda fastAPI Docker

AI/ML Research Engineer

Independent Research & Academic Collaborations

Jan 2022 - Mar 2023 (1 year 2 months)

Developed and published medical and vision research, including MediVQA (medical visual question answering) and an automated retail billing pipeline using YOLOv8 detection with DeepSORT tracking and QR-code recognition. Conducted multimodal fusion and low-resource NLP experiments, tracking runs with MLflow and building reproducible training pipelines with Hugging Face tools and GitHub.

BERT ResNet 101 YOLOv8 MLFlow Accelerate PyTorch