akash singh
@akashsingh11
AI Technical Lead specializing in real-time voice agents, multilingual speech AI, and on-device inference.
What I'm looking for
I’m an AI Technical Lead focused on shipping production-grade speech AI—especially real-time, full-duplex voice-to-voice experiences. At Paytm, I designed and shipped a WebRTC pipeline that connects streaming ASR → LLM reasoning → TTS for travel booking.
I also architected a multi-agent conversational system with LangGraph StateGraph and GPT-4, coordinating 10+ agents for search, filtering, and booking, achieving 95% intent recognition accuracy. I integrated live travel APIs using async clients with circuit breaker patterns and maintained 99.5% system uptime in production.
Previously, as a Research Scientist at Saarthi.AI, I led end-to-end TTS research across Tacotron, FastSpeech, and HiFi-GAN for single-/multi-speaker and multilingual settings across 11 Indian languages at 5M calls/day. I built and deployed streaming ASR systems (DeepSpeech, Whisper, Kaldi), developed a full NLU pipeline from data creation to Azure/AWS deployment, and led cross-functional teams across research-to-deployment—bringing models onto on-device Android for real-time recommendation inside a keyboard product.
Experience
Work history, roles, and key accomplishments
Designed and shipped a real-time voice-to-voice agent over WebRTC, integrating streaming ASR → LLM reasoning → TTS into a full-duplex production pipeline for travel booking. Architected a multi-agent system coordinating 10+ agents and achieved 95% intent recognition accuracy while maintaining 99.5% uptime via resilient API integration and hallucination mitigation.
Research Scientist (TTS/ASR)
Saarthi.AI
Aug 2021 - Jan 2025 (3 years 5 months)
Led end-to-end TTS research across Tacotron, FastSpeech, and HiFi-GAN for single-speaker, multi-speaker, and multilingual settings across 11 Indian languages at 5M calls/day. Built and deployed streaming ASR systems and an end-to-end NLU pipeline on Azure/AWS, and drove on-device model distillation for Android real-time recommendation.
Deep Learning Engineer (Speech/NLP)
Saarthi.AI
Dec 2018 - Jul 2021 (2 years 7 months)
Trained ELMo and ULMFiT language models from scratch in 9 Indian languages and applied them to entity tagging, text classification, semantic role labeling, and POS tagging. Built speaker recognition pipelines (X-vector/D-vector) achieving 95%+ accuracy and developed keyword spotting and dialog policy using TensorFlow.js and deep RL for browser deployment.
Education
Degrees, certifications, and relevant coursework
Institute of Engineering and Technology, Lucknow
Bachelor of Technology, Computer Science & Engineering
Completed a B.Tech in Computer Science and Engineering at IET Lucknow in 2018.
Availability
Location
Authorized to work in
Website
akashicmarga.github.ioJob categories
Skills
Interested in hiring akash?
You can contact akash and 90k+ other talented remote workers on Himalayas.
Message akashFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
