Open to opportunities

Srujan Teja User

@srujantejauser

Message

AI/ML Engineer focused on Agentic AI, Voice AI, RAG, and MLOps—building low-latency, production-ready systems.

India

Message

What I'm looking for

I’m looking for a team where I can build and ship production-grade Agentic AI, Voice AI, and RAG systems—owning low-latency inference, deployment, observability, and measurable evaluation with strong engineering rigor and iteration speed.

I’m an AI/ML Engineer building production-grade systems across Agentic AI, Voice AI, RAG, GenAI, and MLOps, with a strong bias toward low-latency inference, scalable deployment, and reliable memory architectures. I focus on taking research ideas to production by designing systems end-to-end—from orchestration and routing to observability and evaluation.

In agentic systems, I architected a multi-agent orchestration runtime with LangGraph featuring dynamic task decomposition, tool-use routing, and self-correction loops, plus shared-state coordination using Redis pub/sub and durable task memory in PostgreSQL. For voice, I built a multilingual real-time Voice AI platform supporting 10+ Indian languages with Whisper + INT8 ONNX STT, VITS2 fine-tuned TTS, and sub-150ms end-to-end latency on CPU, served via Triton with async WebSocket streaming.

On the retrieval and generation side, I engineered an enterprise semantic search and NLP intelligence engine with hybrid dense-sparse retrieval (BGE-M3 + BM25), HyDE expansion, and cross-encoder reranking achieving 91% top-3 retrieval precision, while cutting irrelevant retrievals by 38% and reducing P99 latency from 420ms to 55ms at scale. I also delivered a multimodal GenAI document intelligence platform with GPT-4o/Claude routing that reduced inference cost by 42% and supported 500+ concurrent sessions, plus an AI memory architecture for conversational agents that reduced prompt token overhead by 60% while improving multi-session coherence.

Experience

Work history, roles, and key accomplishments

Current

Production Multi-Agent System

Current

Independent Project

Mar 2026 - Present (4 months)

Architected a multi-agent runtime with dynamic task decomposition, tool-use routing, and self-correction loops; agents autonomously plan, execute, and recover across 10+ tool integrations. Built shared-state coordination using Redis pub/sub and PostgreSQL for durable task memory to enable concurrent execution without cross-agent context collisions.

LangGraph Multi Agent Systems Python fastAPI PostgreSQL Docker Compose Prometheus OpenTelemetry

Current

Multilingual Real-Time Voice AI

Current

Independent Project

Nov 2025 - Present (8 months)

Built a multilingual voice AI platform for 10+ Indian languages with real-time STT and neural TTS, achieving sub-150ms end-to-end latency on CPU. Implemented zero-shot voice cloning from 5s of audio and served the system on Triton with dynamic batching and async WebSocket streaming.

Whisper VITS2 Triton Inference Server WebSockets fastAPI Voice Cloning MLFlow

GenAI Document Intelligence Platform

Independent Project

Feb 2026 - Present (5 months)

Built a multimodal GenAI platform ingesting PDFs, spreadsheets, and images via Unstructured.io to generate audit-ready document summaries and executable code from natural-language specs. Designed a multi-LLM routing layer (GPT-4o vs Claude) based on task type, cost, and latency SLAs, reducing inference cost by 42% without degrading output quality.

Unstructured.Io fastAPI Streamlit GPT 4o Claude Prompt Engineering MLFlow A B Testing

Enterprise Semantic Search Engine

Independent Project

Jan 2026 - Present (6 months)

Engineered a RAG search engine over 100k+ documents using hybrid dense-sparse retrieval (BGE-M3 + BM25), HyDE query expansion, and cross-encoder reranking to reach 91% top-3 retrieval precision. Added NER/intent/coreference layers for query understanding and reduced irrelevant retrievals by 38% using vector-filtering plus Airflow/Kafka ingestion and Redis caching (P99 latency 420ms to 55ms).

RAG Qdrant BGE M3 BM25 Kafka Airflow Redis

AI Memory Architecture for Agents

Independent Project

Jan 2026 - Present (6 months)

Engineered a three-tier conversational memory system (working, episodic, semantic) with recency decay and frequency-weighted retrieval to improve multi-session coherence and prevent contradictory context hallucinations. Compressed conversations into rolling semantic summaries with a learned decay function, cutting prompt token overhead by 60% while preserving long-range context.

PyTorch Faiss Sentence Transformers Conversational Memory Recency Decay Prompt Token Reduction Embedding Based Retrieval

KAN Benchmarking Research

Independent Project

Feb 2025 - Present (1 year 5 months)

Rebuilt Kolmogorov–Arnold Networks (KAN) from scratch using kernel-based learnable activations on edges with fine-grained spline parameterization control. Benchmarked against MLP baselines and achieved 20% lower MSE with 1.5× faster convergence, validated via ablation studies over grid resolution and spline order.

PyTorch Kolmogorov–Arnold Networks (KAN)Kernel Based Activations Spline Parameterization Regression Modeling Ablation Studies