Parit Kansal

What I'm looking for

I'm looking for ML-focused roles where I can own problems end-to-end — not just hand off a model, but see it through to production. Computer Vision, Document AI, and NLP are where I spend most of my time. I want a team that cares about why things work, not just whether they do, with real business impact and room to go deep.

I'm a Machine Learning Scientist with 1+ years of hands-on experience building and shipping production-grade AI systems across Computer Vision, NLP, and Document Understanding. I work at Xelpmoc Design and Tech Ltd, where I take projects from research to deployment — owning the full pipeline, not just the model.

Most recently, I built a real-time video analytics platform for McDonald's India, processing multiple concurrent RTSP live feeds using YOLO, Vision Transformers, InsightFace, and XCLIP. The system handles behavioral monitoring, staff analytics, hygiene compliance automation, and facial emotion recognition — with outcomes like 94% table-plate detection accuracy and 98% zero-shot video classification for behavior-policy compliance. Getting it production-ready meant building fault-tolerant microservices using Redis Streams, MinIO, MongoDB replica sets, and Grafana/Loki for observability.

On the Document AI side, I fine-tuned DONUT models for key-value extraction hitting 98% field-level accuracy, and built an automated Gemma-3-4B judge pipeline that now validates over 100K documents a month at 96.67% document-level accuracy.

I've also led a multi-channel lead scoring system fusing WhatsApp, web, and audio interaction data using ModernBERT and XGBoost — achieving 83% recall within the top 13% ranked leads. Earlier work includes web visitor scoring models trained on 8M+ sessions, reaching 88% session-level and 90% visitor-level recall.

My technical stack is Python-first: PyTorch, TensorFlow, FastAPI, OpenCV, Docker. I care about understanding why models work — the math, the failure modes, the edge cases — not just getting them to run. My core interests are Computer Vision, Generative AI, Document Intelligence, and real-time multimodal systems.