Skip to main content
Srujan Teja UserSU
Open to opportunities

Srujan Teja User

@srujantejauser

AI/ML Engineer focused on Agentic AI, Voice AI, RAG, and MLOps—building low-latency, production-ready systems.

India
Message

What I'm looking for

I’m looking for a team where I can build and ship production-grade Agentic AI, Voice AI, and RAG systems—owning low-latency inference, deployment, observability, and measurable evaluation with strong engineering rigor and iteration speed.

I’m an AI/ML Engineer building production-grade systems across Agentic AI, Voice AI, RAG, GenAI, and MLOps, with a strong bias toward low-latency inference, scalable deployment, and reliable memory architectures. I focus on taking research ideas to production by designing systems end-to-end—from orchestration and routing to observability and evaluation.

In agentic systems, I architected a multi-agent orchestration runtime with LangGraph featuring dynamic task decomposition, tool-use routing, and self-correction loops, plus shared-state coordination using Redis pub/sub and durable task memory in PostgreSQL. For voice, I built a multilingual real-time Voice AI platform supporting 10+ Indian languages with Whisper + INT8 ONNX STT, VITS2 fine-tuned TTS, and sub-150ms end-to-end latency on CPU, served via Triton with async WebSocket streaming.

On the retrieval and generation side, I engineered an enterprise semantic search and NLP intelligence engine with hybrid dense-sparse retrieval (BGE-M3 + BM25), HyDE expansion, and cross-encoder reranking achieving 91% top-3 retrieval precision, while cutting irrelevant retrievals by 38% and reducing P99 latency from 420ms to 55ms at scale. I also delivered a multimodal GenAI document intelligence platform with GPT-4o/Claude routing that reduced inference cost by 42% and supported 500+ concurrent sessions, plus an AI memory architecture for conversational agents that reduced prompt token overhead by 60% while improving multi-session coherence.

Experience

Work history, roles, and key accomplishments

IP
Current

Production Multi-Agent System

Independent Project

Mar 2026 - Present (3 months)

Architected a multi-agent runtime with dynamic task decomposition, tool-use routing, and self-correction loops; agents autonomously plan, execute, and recover across 10+ tool integrations. Built shared-state coordination using Redis pub/sub and PostgreSQL for durable task memory to enable concurrent execution without cross-agent context collisions.

IP
Current

Multilingual Real-Time Voice AI

Independent Project

Nov 2025 - Present (7 months)

Built a multilingual voice AI platform for 10+ Indian languages with real-time STT and neural TTS, achieving sub-150ms end-to-end latency on CPU. Implemented zero-shot voice cloning from 5s of audio and served the system on Triton with dynamic batching and async WebSocket streaming.

IP

GenAI Document Intelligence Platform

Independent Project

Feb 2026 - Present (4 months)

Built a multimodal GenAI platform ingesting PDFs, spreadsheets, and images via Unstructured.io to generate audit-ready document summaries and executable code from natural-language specs. Designed a multi-LLM routing layer (GPT-4o vs Claude) based on task type, cost, and latency SLAs, reducing inference cost by 42% without degrading output quality.

IP

Enterprise Semantic Search Engine

Independent Project

Jan 2026 - Present (5 months)

Engineered a RAG search engine over 100k+ documents using hybrid dense-sparse retrieval (BGE-M3 + BM25), HyDE query expansion, and cross-encoder reranking to reach 91% top-3 retrieval precision. Added NER/intent/coreference layers for query understanding and reduced irrelevant retrievals by 38% using vector-filtering plus Airflow/Kafka ingestion and Redis caching (P99 latency 420ms to 55ms).

IP

AI Memory Architecture for Agents

Independent Project

Jan 2026 - Present (5 months)

Engineered a three-tier conversational memory system (working, episodic, semantic) with recency decay and frequency-weighted retrieval to improve multi-session coherence and prevent contradictory context hallucinations. Compressed conversations into rolling semantic summaries with a learned decay function, cutting prompt token overhead by 60% while preserving long-range context.

IP

KAN Benchmarking Research

Independent Project

Feb 2025 - Present (1 year 4 months)

Rebuilt Kolmogorov–Arnold Networks (KAN) from scratch using kernel-based learnable activations on edges with fine-grained spline parameterization control. Benchmarked against MLP baselines and achieved 20% lower MSE with 1.5× faster convergence, validated via ablation studies over grid resolution and spline order.

Education

Degrees, certifications, and relevant coursework

Bennett University logoBU

Bennett University

Bachelor of Technology (B.Tech), Computer Science Engineering

2023 -

Grade: 7.01/10 CGPA

Activities and societies: Leadership Secretary, Game Development Club (Bennett University) and Secretary, RPA Club (Bennett University).

B.Tech in Computer Science Engineering at Bennett University (2023–2027), maintaining a CGPA of 7.01/10.

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan