Open to opportunities

Subhadip Mondal

@subhadipmondal1

Message

I build production AI systems—RAG, LLM tuning, and MLOps for real-world decisions.

India

Message

What I'm looking for

I’m looking for a role where I can build and deploy AI/ML products end-to-end—RAG, LLM fine-tuning, and MLOps on AWS—while improving latency, reliability, and evaluation quality with real production feedback.

I’m an AI/ML engineer who turns LLM ideas into production systems—RAG pipelines, LLM fine-tuning, and reliable inference. I specialize in model evaluation and practical retrieval design to deliver accurate, fast answers.

In my current Business Analyst role, I automate workflows and apply GPT-based document intelligence to reduce manual processing while improving accuracy. As an AI/ML Freelance Engineer, I built a voice-enabled financial assistant using RAG (500+ earnings reports) and a Gmail–Slack assistant with multi-modal processing (Whisper, Google TTS), handling 200+ queries/day.

I also ship end-to-end projects like DeciScope and RepoMind, combining hybrid retrieval (Pinecone + PostgreSQL/pgvector) with AWS serverless infrastructure to cut latency and track production metrics. Recently, I fine-tuned Llama-3.2-3B with QLoRA and deployed optimized inference with vLLM, improving ROUGE-L and BERTScore while reducing VRAM needs.

Experience

Work history, roles, and key accomplishments

Current

Business Analyst

Current

Enverus

Jul 2025 - Present (1 year)

Automated supplier onboarding workflows using Python and Salesforce API, reducing manual processing time by 40% while handling 150+ vendor records/month with 98% data accuracy. Built a GPT-4-based internal document classification system that processed 200+ legal documents/week with 91% classification accuracy and automated metadata extraction.

Python Salesforce API Workflow Automation Document Classification Metadata Data Accuracy In ATS Automation

Current

AI & ML Freelance Engineer

Current

Independent Contractor

Jan 2025 - Present (1 year 6 months)

Built a voice-enabled financial assistant using GPT-4 RAG over 500+ earnings reports, achieving 89% answer accuracy with real-time stock data APIs. Developed a Gmail-Slack multi-modal assistant (Whisper + Google TTS) handling 200+ queries/day and engineered a hybrid Pinecone + PostgreSQL retrieval system to reduce query latency from 3.2s to 840ms.

RAG GPT 4 Pinecone PostgreSQL (Hybrid Retrieval)Whisper Google TTS Voice Assistant Systems API Integration

AI GitHub Intelligence Assistant

RepoMind

Built a production RAG system for 10K+ file codebases achieving 0.84 answer relevance using LangChain + Gemini 1.5 Pro with AST-based parsing and semantic chunking. Optimized pgvector search queries and connection pooling to reduce p95 latency from 2.1s to 780ms while handling 1K+ webhook events/hour, and implemented incremental indexing with deduplication to cut embedding costs by 73%.

Next.js Lang Chain Pgvector Webhooks

Decision Intelligence Platform

DeciScope

Architected real-time decision monitoring integrating Slack API, processing 500+ messages/day with 87% extraction accuracy using GPT-4 and custom prompt chains. Designed hybrid memory with Pinecone vector search + DynamoDB metadata (0.81 context precision) and built serverless AWS infrastructure with sub-200ms latency, reducing decision reversal rate by 23% in pilot deployment.

Next.js Slack AWS Lambda EventBridge Pinecone DynamoDB Monitoring

Code Documentation SLM

DocForge

Fine-tuned Llama-3.2-3B on 8.5K+ code documentation examples using QLoRA (rank=32, alpha=64), reducing VRAM from 24GB to 8GB. Improved ROUGE-L (0.42→0.56) and BERTScore (0.71→0.91) versus the base model, and deployed with vLLM for sub-1.2s generation at 512 tokens, publishing the model on HuggingFace with documentation.

PyTorch QLoRA VLLM LLaMA 2 Model Evaluation