Open to opportunities

akash singh

@akashsingh11

Message

AI Technical Lead specializing in real-time voice agents, multilingual speech AI, and on-device inference.

India

Message

What I'm looking for

I’m looking for a product-focused role building real-time, multilingual speech AI end-to-end—reliable production performance, hallucination mitigation, and strong on-device/edge inference optimization for conversational agents.

I’m an AI Technical Lead focused on shipping production-grade speech AI—especially real-time, full-duplex voice-to-voice experiences. At Paytm, I designed and shipped a WebRTC pipeline that connects streaming ASR → LLM reasoning → TTS for travel booking.

I also architected a multi-agent conversational system with LangGraph StateGraph and GPT-4, coordinating 10+ agents for search, filtering, and booking, achieving 95% intent recognition accuracy. I integrated live travel APIs using async clients with circuit breaker patterns and maintained 99.5% system uptime in production.

Previously, as a Research Scientist at Saarthi.AI, I led end-to-end TTS research across Tacotron, FastSpeech, and HiFi-GAN for single-/multi-speaker and multilingual settings across 11 Indian languages at 5M calls/day. I built and deployed streaming ASR systems (DeepSpeech, Whisper, Kaldi), developed a full NLU pipeline from data creation to Azure/AWS deployment, and led cross-functional teams across research-to-deployment—bringing models onto on-device Android for real-time recommendation inside a keyboard product.

Experience

Work history, roles, and key accomplishments

Current

Technical Lead - AI

Current

Paytm

Jan 2025 - Present (1 year 6 months)

Designed and shipped a real-time voice-to-voice agent over WebRTC, integrating streaming ASR → LLM reasoning → TTS into a full-duplex production pipeline for travel booking. Architected a multi-agent system coordinating 10+ agents and achieved 95% intent recognition accuracy while maintaining 99.5% uptime via resilient API integration and hallucination mitigation.

WebRTC Streaming ASR TTS Multi Agent Systems GPT 4 Circuit Breaker Patterns

Research Scientist (TTS/ASR)

Saarthi.AI

Aug 2021 - Jan 2025 (3 years 5 months)

Led end-to-end TTS research across Tacotron, FastSpeech, and HiFi-GAN for single-speaker, multi-speaker, and multilingual settings across 11 Indian languages at 5M calls/day. Built and deployed streaming ASR systems and an end-to-end NLU pipeline on Azure/AWS, and drove on-device model distillation for Android real-time recommendation.

Tacotron FastSpeech Streaming ASR (Whisper DeepSpeech Kaldi)NLU Model Distillation Azure AWS

Deep Learning Engineer (Speech/NLP)

Saarthi.AI

Dec 2018 - Jul 2021 (2 years 7 months)

Trained ELMo and ULMFiT language models from scratch in 9 Indian languages and applied them to entity tagging, text classification, semantic role labeling, and POS tagging. Built speaker recognition pipelines (X-vector/D-vector) achieving 95%+ accuracy and developed keyword spotting and dialog policy using TensorFlow.js and deep RL for browser deployment.

ULMFIT ELMO ULMFiT Speaker Recognition (X Vector D Vector)Keyword Spotting TensorFlow JS Semantic Role Labeling POS Tagging