Skip to main content
HimalayasHimalayas logo
sujana acharyaSA
Open to opportunities

sujana acharya

@sujanaacharya

AI/ML engineer building reliable, cost-efficient LLM/RAG and agentic systems with 4+ years in production and research.

Nepal
Message

What I'm looking for

I’m looking for an AI/ML role where I can ship production RAG and agentic systems, optimize model latency/cost, and collaborate on applied research—prioritizing reliability, monitoring, and measurable impact.

I’m an AI/ML Engineer with 4+ years spanning production model deployment, LLM integration, agentic systems, and applied research in computer vision and NLP. I build end-to-end LLM systems—RAG pipelines, multi-agent workflows, and model optimization—then deploy them with real monitoring and reliability controls.

In production, I integrated Anthropic Claude and OpenAI GPT-4o into FastAPI services, adding retry logic, exponential back-off, circuit breakers, and token-budget guardrails to reach 99.5% uptime on LLM-dependent endpoints. I designed RAG pipelines with FAISS and pgvector hybrid search, cutting hallucination rate by ~40%, and built LangChain/LangGraph agentic workflows for automated analysis and adaptive recommendations.

I also focus on performance and cost: I deployed locally via Ollama, optimized inference with vLLM/PagedAttention, and used LoRA/PEFT plus 4-bit GPTQ/bitsandbytes quantization to reduce latency and VRAM. My research background includes publishing in medical AI (MediVQA) and computer vision (Automated Retail Billing), and my goal is to build systems that are reliable, cost-efficient, and truly impactful.

Experience

Work history, roles, and key accomplishments

NS
Current

AI/ML Engineer

NsDevil

Apr 2023 - Present (3 years 2 months)

Integrated Anthropic Claude and OpenAI GPT-4o into production FastAPI services, implementing resilience controls to achieve 99.5% uptime on LLM endpoints. Built end-to-end RAG and agentic workflows (LangChain/LangGraph) and optimized inference with vLLM and 4-bit quantization, reducing hallucinations by ~40% and cutting inference latency from 2.3s to 0.8s while lowering monthly API spend by 35%.

IC

AI/ML Research Engineer

Independent Research & Academic Collaborations

Jan 2022 - Mar 2023 (1 year 2 months)

Developed and published medical and vision research, including MediVQA (medical visual question answering) and an automated retail billing pipeline using YOLOv8 detection with DeepSORT tracking and QR-code recognition. Conducted multimodal fusion and low-resource NLP experiments, tracking runs with MLflow and building reproducible training pipelines with Hugging Face tools and GitHub.

Education

Degrees, certifications, and relevant coursework

sujana hasn't added their education

Don't worry, there are 90k+ talented remote workers on Himalayas

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan